opencaselaw.ch

Research preprint · March 2026

OpenCaseLaw: An Open Dataset and Search Platform for Swiss Court Decisions

Jonas Hertner arXiv preprint v1 · March 20, 2026 MIT (code) · CC0 (data)
Download PDF (final) arXiv submission PDF Source code (MIT) Dataset (CC0)

Abstract

We present OpenCaseLaw, an open corpus and retrieval stack for Swiss case law. In the repository snapshot generated on March 20, 2026, the dataset contains 962,724 decisions from 102 federal, cantonal, and regulatory courts or public bodies, covering all 26 cantons and the period 1875–2026. The current snapshot contains 448,461 German decisions (46.6 %), 434,663 French decisions (45.1 %), and 79,600 Italian decisions (8.3 %); the export schema also reserves a Romansh language code. OpenCaseLaw releases a 34-field Parquet export together with tooling to build a local SQLite FTS5 index and reference database; in the March 20, 2026 graph build, that database contains 8.76 million extracted case-citation references, 6.42 million resolved in-corpus decision links, and 11.23 million decision-statute links. We describe the collection, normalization, deduplication, export, and retrieval pipeline, and release a multilingual benchmark harness with 100 tagged evaluation queries plus a release-matched offline baseline report. The code is MIT-licensed, and each record links back to the decision as published by the originating court or public body.

Contributions

Cite

Until the paper appears on arXiv, please cite as:

@misc{hertner2026opencaselaw,
  author       = {Hertner, Jonas},
  title        = {OpenCaseLaw: An Open Dataset and Search Platform for Swiss Court Decisions},
  year         = {2026},
  howpublished = {Preprint, March 2026},
  url          = {https://opencaselaw.ch/paper/opencaselaw-arxiv-final.pdf},
  note         = {Snapshot 2026-03-20: 962,724 decisions; current corpus 971,067 (May 2026)}
}

Materials

paper.pdf
PDF · final draft · ~20 pp
The full paper, internal cold-read pass complete.
submission.pdf
PDF · arXiv-ready
Submission-formatted PDF (NeurIPS D&B template).
source.tar.gz
LaTeX source · figures · BibTeX
Full reproducible source bundle.
paper.md
Markdown render
Plain-text version for diffing & web reading.
github.com/jonashertner/caselaw-repo-1
Source code · MIT
All scrapers, retrieval stack, MCP server, build pipeline.
voilaj/swiss-caselaw
Hugging Face · CC0 · ~7 GB Parquet
Daily-refreshed Parquet export of the full corpus.

Reproducibility

Every numerical claim in the paper resolves to a commit-pinned artifact in the public repository. The 100-question multilingual benchmark lives at benchmarks/swiss_legal_rag_bench/v1.jsonl with the 50-question cross-lingual companion at benchmarks/swiss_legal_rag_bench/cross_lingual_v1.jsonl. Frozen offline retrieval results are pinned at benchmarks/search_benchmark_2026-03-19_offline_full.json. The dataset quality framework lives in quality/; its public dashboard is at /quality.html.

Contact

Questions, collaboration, peer review, citation chains: team@jonashertner.com. Issue tracker: GitHub issues.