Research preprint · March 2026
OpenCaseLaw: An Open Dataset and Search Platform for Swiss Court Decisions
Abstract
We present OpenCaseLaw, an open corpus and retrieval stack for Swiss case law. In the repository snapshot generated on March 20, 2026, the dataset contains 962,724 decisions from 102 federal, cantonal, and regulatory courts or public bodies, covering all 26 cantons and the period 1875–2026. The current snapshot contains 448,461 German decisions (46.6 %), 434,663 French decisions (45.1 %), and 79,600 Italian decisions (8.3 %); the export schema also reserves a Romansh language code. OpenCaseLaw releases a 34-field Parquet export together with tooling to build a local SQLite FTS5 index and reference database; in the March 20, 2026 graph build, that database contains 8.76 million extracted case-citation references, 6.42 million resolved in-corpus decision links, and 11.23 million decision-statute links. We describe the collection, normalization, deduplication, export, and retrieval pipeline, and release a multilingual benchmark harness with 100 tagged evaluation queries plus a release-matched offline baseline report. The code is MIT-licensed, and each record links back to the decision as published by the originating court or public body.
Contributions
- Open corpus. 971,067 published Swiss court decisions (snapshot 2026-05-02) across 121 federal, regulatory, international, and cantonal courts. Released as a 34-field Parquet export under CC0 — the largest open dataset of Swiss case law.
- Open retrieval stack. SQLite FTS5 index + atomic-rebuild pipeline. BM25 + RRF + structured-LLM parse + Haiku reranking. Frozen offline baseline (March 19, 2026) at MRR 0.470, online with LLM at 0.647 — both reproducible from
benchmarks/. - Citation graph. 8.76 M extracted case-citation references; 6.42 M resolved to in-corpus decisions (73.8 % link rate). 11.23 M extracted decision-statute links across 281 K distinct provisions.
- Open MCP surface. 29 specialised tools at
mcp.opencaselaw.ch— search, citation graph, statute lookup, doctrine timelines, multilingual commentary, exam-question generation. Free, no API keys, OpenAPI 3.0.3. - Multilingual benchmark. 100 tagged queries (74 de · 16 fr · 7 it · 3 cross-lingual) across 15 legal domains, citation-graph verified. Plus a 50-question cross-lingual leading-case retrieval set with parallel DE/FR/IT phrasings.
- Quality framework. 4-layer dataset quality framework — 61 dataset checks + 46 pytest checks + L4 production smoke + Step 6c gate that blocks
git pushon critical regressions. Public dashboard at opencaselaw.ch/quality.html.
Cite
Until the paper appears on arXiv, please cite as:
@misc{hertner2026opencaselaw,
author = {Hertner, Jonas},
title = {OpenCaseLaw: An Open Dataset and Search Platform for Swiss Court Decisions},
year = {2026},
howpublished = {Preprint, March 2026},
url = {https://opencaselaw.ch/paper/opencaselaw-arxiv-final.pdf},
note = {Snapshot 2026-03-20: 962,724 decisions; current corpus 971,067 (May 2026)}
}
Materials
paper.pdf
The full paper, internal cold-read pass complete.
submission.pdf
Submission-formatted PDF (NeurIPS D&B template).
source.tar.gz
Full reproducible source bundle.
paper.md
Plain-text version for diffing & web reading.
github.com/jonashertner/caselaw-repo-1
All scrapers, retrieval stack, MCP server, build pipeline.
voilaj/swiss-caselaw
Daily-refreshed Parquet export of the full corpus.
Reproducibility
Every numerical claim in the paper resolves to a commit-pinned artifact in the public repository. The 100-question multilingual benchmark lives at
benchmarks/swiss_legal_rag_bench/v1.jsonl with the 50-question cross-lingual companion at
benchmarks/swiss_legal_rag_bench/cross_lingual_v1.jsonl. Frozen offline retrieval results are pinned at
benchmarks/search_benchmark_2026-03-19_offline_full.json. The dataset quality framework lives in
quality/; its public dashboard is at /quality.html.
Contact
Questions, collaboration, peer review, citation chains: team@jonashertner.com. Issue tracker: GitHub issues.