Research preprint · March 2026

OpenCaseLaw: An Open Dataset and Search Platform for Swiss Court Decisions

Jonas Hertner arXiv preprint v1 · March 20, 2026 MIT (code) · CC0 (data)

Abstract

We present OpenCaseLaw, an open corpus and retrieval stack for Swiss case law. In the repository snapshot generated on March 20, 2026, the dataset contains 962,724 decisions from 102 federal, cantonal, and regulatory courts or public bodies, covering all 26 cantons and the period 1875–2026. The current snapshot contains 448,461 German decisions (46.6 %), 434,663 French decisions (45.1 %), and 79,600 Italian decisions (8.3 %); the export schema also reserves a Romansh language code. OpenCaseLaw releases a 34-field Parquet export together with tooling to build a local SQLite FTS5 index and reference database; in the March 20, 2026 graph build, that database contains 8.76 million extracted case-citation references, 6.42 million resolved in-corpus decision links, and 11.23 million decision-statute links. We describe the collection, normalization, deduplication, export, and retrieval pipeline, and release a multilingual benchmark harness with 100 tagged evaluation queries plus a release-matched offline baseline report. The code is MIT-licensed, and each record links back to the decision as published by the originating court or public body.

Contributions

  • Open corpus. 971,992 published Swiss court decisions (snapshot 2026-05-13) across 121 federal, regulatory, international, and cantonal courts. Released as a 34-field Parquet export under CC0 — the largest open dataset of Swiss case law.
  • Open retrieval stack. SQLite FTS5 index + atomic-rebuild pipeline. BM25 + RRF + structured-LLM parse + Haiku reranking. Frozen offline baseline (March 19, 2026) at MRR 0.470, online with LLM at 0.647 — both reproducible from benchmarks/.
  • Citation graph. 8.76 M extracted case-citation references; 6.42 M resolved to in-corpus decisions (73.8 % link rate). 11.23 M extracted decision-statute links across 281 K distinct provisions.
  • Open MCP surface. 29 specialised tools at mcp.opencaselaw.ch — search, citation graph, statute lookup, doctrine timelines, multilingual commentary, exam-question generation. Free, no API keys, OpenAPI 3.0.3.
  • Multilingual benchmark. 100 tagged queries (74 de · 16 fr · 7 it · 3 cross-lingual) across 15 legal domains, citation-graph verified. Plus a 50-question cross-lingual leading-case retrieval set with parallel DE/FR/IT phrasings.
  • Quality framework. 4-layer dataset quality framework — 61 dataset checks + 46 pytest checks + L4 production smoke + Step 6c gate that blocks git push on critical regressions. Public dashboard at opencaselaw.ch/quality.html.

Cite

Until the paper appears on arXiv, please cite as:

@misc{hertner2026opencaselaw,
  author       = {Hertner, Jonas},
  title        = {OpenCaseLaw: An Open Dataset and Search Platform for Swiss Court Decisions},
  year         = {2026},
  howpublished = {Preprint, March 2026},
  url          = {https://opencaselaw.ch/paper/opencaselaw-arxiv-final.pdf},
  note         = {Snapshot 2026-05-13: 971,992 decisions; live corpus ~972,000+}
}

Materials

Reproducibility

Every numerical claim in the paper resolves to a commit-pinned artifact in the public repository. The 100-question multilingual benchmark lives at benchmarks/swiss_legal_rag_bench/v1.jsonl with the 50-question cross-lingual companion at benchmarks/swiss_legal_rag_bench/cross_lingual_v1.jsonl. Frozen offline retrieval results are pinned at benchmarks/search_benchmark_2026-03-19_offline_full.json. The dataset quality framework lives in quality/; its public dashboard is at /quality.html.

Contact

Questions, collaboration, peer review, citation chains: team@jonashertner.com. Issue tracker: GitHub issues.