Operational · scrapers

Coverage

Per-source scraper health for the OpenCaseLaw Swiss legal corpus, refreshed nightly by the publish pipeline (03:00 UTC). The table below shows every active scraper, the decisions it added in the last run, the total it holds, and — where the source portal exposes it — a comparison against the portal's published count so any gap is visible.

Last run
Healthy
No new docs
Failing
Total decisions
Source Status + last run Total ours Portal Gap Duration

Data source: /api/scraper-health (run_scraper.py). A non-zero gap usually means a publication delay at the source portal, not a missed scrape. Persistent failures are tracked in the issue tracker. For institutional collaboration on coverage gaps, write to team@jonashertner.com.

§ Frozen archives

Supplementary frozen archives

Beyond the live scrapers above, the corpus also incorporates seven frozen archives sourced via entscheidsuche.ch — an independent open-access aggregator that mirrors many Swiss court portals. We re-ingest entscheidsuche's JSON archives once a week (Sunday 22:00 UTC) as a completeness backstop; the last six runs added zero new decisions because the upstream portals these shards reflect are no longer producing new content or have been replaced. The records are preserved as historical snapshots inside our corpus.

Shard Records Upstream status
vd_findinfo 74,866 Historical VD jurisprudence portal (jurisprudence.findinfo.ch) retired; current decisions covered by direct vd_gerichte scraper.
vd_omni 28,033 Historical VD aggregation; superseded by current portals.
ch_vb 23,055 Federal Bundesrat publications (amtsdruckschriften.bar.admin.ch) — Recueil officiel, Postulate, Botschaften. Mixed parliamentary/admin content, not court decisions in the strict sense.
sg_gerichte 12,572 Predecessor of sg_publikationen (live, 12,690 records direct-scraped).
be_bvd 2,094 BE Behörden-Verwaltungs-Direktion. Administrative tribunal; no live portal endpoint.
be_weitere 840 BE miscellaneous administrative bodies.
be_steuerrekurs 343 Steuerrekurskommission BE. Direct scraper exists; upstream portal database disconnected since February 2026, returns 0 results. We are tracking the issue with the canton.

All other entscheidsuche shards (~40 of 51) are shadowed by an independent direct scraper against the source portal. Where both exist, dedup at build-time uses our direct data as the source of record. Entscheidsuche.ch is acknowledged in the dataset card as a contributing upstream for the frozen-archive set; their re-aggregation work has CC BY-style attribution.