Operational · scrapers
Coverage
Per-source scraper health for the OpenCaseLaw Swiss legal corpus, refreshed nightly by the publish pipeline (03:00 UTC). The table below shows every active scraper, the decisions it added in the last run, the total it holds, and — where the source portal exposes it — a comparison against the portal's published count so any gap is visible.
| Source | Status | + last run | Total ours | Portal | Gap | Duration |
|---|---|---|---|---|---|---|
Data source: /api/scraper-health (run_scraper.py). A non-zero gap usually means a publication delay at the source portal, not a missed scrape. Persistent failures are tracked in the issue tracker. For institutional collaboration on coverage gaps, write to team@jonashertner.com.
§ Frozen archives
Supplementary frozen archives
Beyond the live scrapers above, the corpus also incorporates seven frozen archives sourced via entscheidsuche.ch — an independent open-access aggregator that mirrors many Swiss court portals. We re-ingest entscheidsuche's JSON archives once a week (Sunday 22:00 UTC) as a completeness backstop; the last six runs added zero new decisions because the upstream portals these shards reflect are no longer producing new content or have been replaced. The records are preserved as historical snapshots inside our corpus.
| Shard | Records | Upstream status |
|---|---|---|
vd_findinfo |
74,866 | Historical VD jurisprudence portal (jurisprudence.findinfo.ch) retired; current decisions covered by direct vd_gerichte scraper. |
vd_omni |
28,033 | Historical VD aggregation; superseded by current portals. |
ch_vb |
23,055 | Federal Bundesrat publications (amtsdruckschriften.bar.admin.ch) — Recueil officiel, Postulate, Botschaften. Mixed parliamentary/admin content, not court decisions in the strict sense. |
sg_gerichte |
12,572 | Predecessor of sg_publikationen (live, 12,690 records direct-scraped). |
be_bvd |
2,094 | BE Behörden-Verwaltungs-Direktion. Administrative tribunal; no live portal endpoint. |
be_weitere |
840 | BE miscellaneous administrative bodies. |
be_steuerrekurs |
343 | Steuerrekurskommission BE. Direct scraper exists; upstream portal database disconnected since February 2026, returns 0 results. We are tracking the issue with the canton. |
All other entscheidsuche shards (~40 of 51) are shadowed by an independent direct scraper against the source portal. Where both exist, dedup at build-time uses our direct data as the source of record. Entscheidsuche.ch is acknowledged in the dataset card as a contributing upstream for the frozen-archive set; their re-aggregation work has CC BY-style attribution.