Engineering · transparent
Methodology
How OpenCaseLaw finds, ranks and verifies Swiss court decisions — end-to-end, with every weight and threshold on the record. This page is the canonical reference; anything you read elsewhere should match what's written here, or the code wins.
1. Corpus & refresh
The corpus holds 971,852 published Swiss decisions across 108 courts, in three languages (DE 449,847; FR 441,288; IT 80,717), spanning 1875 – 2026. Eight federal courts (BGer 175,439; BVGer 92,293; BGE 35,350; plus BStGer, BPatGer, MKG and others), 26 cantonal jurisdictions, and ECHR-Switzerland (~2,800 decisions across BGE-translations, HUDOC, Chamber/Grand-Chamber/Committee).
Each decision is fetched from its primary source — the court's own publication portal where available (all 26 cantons direct-scraped via LexWork / SIL / ZH OpenData / TI RL, with LexFind PDF fallback supplementing 4 cantons for missing laws), entscheidsuche.ch shards for legacy fills, BGer's official search back-end for federal updates. The pipeline refreshes on three schedules:
- Every 15 minutes on weekdays 05:00–16:00 UTC — the BGer poller incrementally publishes new federal decisions to the live database within a few minutes of their court-side release.
- Daily 01:00 UTC — every active cantonal scraper runs; a soft-fail policy gates on the 5 critical federal scrapers and a 15% failure-rate threshold across the rest.
- Daily 03:30 UTC — full FTS5 rebuild, citation graph rebuild, quality gate, stats refresh, atomic swap with zero-downtime. Workers keep serving the previous DB until the new one is integrity-checked.
The full corpus is also published as Parquet on Hugging Face (voilaj/swiss-caselaw) under CC0; the daily delta is appended automatically.
2. Full-text index
SQLite FTS5 with the unicode61 remove_diacritics 2
tokenizer. The decisions table feeds the index across these columns,
with column-level BM25 weights tuned empirically against a 100-query
golden set:
| Column | BM25 weight | Why |
|---|---|---|
| title | 6.0 | Highest signal — concise, intentional. |
| regeste | 5.5 | Court's own topical summary. |
| docket_number | 2.0 | For exact-docket recall. |
| full_text | 1.2 | Long, noisy — anchor, not driver. |
| court / canton / language / decision_id | 0.8 | Metadata, not content. |
The index is rebuilt atomically: build_fts5.py writes to
decisions.db.tmp then os.replace()'s it into
place. Workers using the ?immutable=1 SQLite URI keep
their open file handles on the old inode until they next reconnect —
the swap is invisible to live readers.
build_fts5.py · BM25 weights configured in
mcp_server.py around lines 431–442 · diacritic tokenizer
mirrored in decision_structure.db for per-paragraph search.
3. Query understanding
A natural-language query never hits FTS5 raw. First it passes through
sanitization (_sanitize_fts5): apostrophes, hyphens and
dots-without-word-chars collapse to spaces; reserved tokens (AND, OR,
NOT, NEAR) are preserved only when they have operands on both sides;
the single Swiss-legal token "OR" (Obligationenrecht abbreviation)
is always force-quoted because it would otherwise be parsed as a
boolean operator.
In parallel, the query is routed to Claude Haiku 4.5 for a 2-second structured parse: statute references, doctrine terms in DE/FR/IT, leading-BGE mentions, synonyms, and legal domain are extracted as deterministic JSON. The output is cached by lowercase query for the session lifetime.
Three normalizations make Swiss legal vocabulary searchable across its stylistic, orthographic and cross-language variants:
- Diacritic-insensitive tokenization. The FTS5
unicode61 remove_diacritics 2tokenizer strips diacritics on both index and query side — so "Prüfung", "PRÜFUNG" and a query for "Prufung" all hit the same posting list. - Umlaut-spelling collapse. The
ae/oe/uediacritic-less spelling (common in older judgments and in any pre-Unicode environment) is collapsed toa/o/u, which the tokenizer then unifies with the diacritic-strippedä/ö/ü. So "Pruefung" finds "Prüfung" too. - LLM synonym expansion. Claude Haiku's structured parse generates 2–4 alternative legal terms in DE/FR/IT per query, on the fly — not from a static table. So "qualité pour recourir" cross-links to "Beschwerdebefugnis" / "Beschwerdelegitimation" / "legittimazione" even if your query contained only one of them.
4. Retrieval & fusion
The query fans out into 10–12 strategies that run as independent FTS5 / graph queries. Each strategy carries a strategy weight; the rank-1 hit from one strategy and the rank-1 hit from another get fused using Reciprocal Rank Fusion (RRF rank constant = 60).
| Strategy | Weight | Effect |
|---|---|---|
| nl_and | 1.8 | Natural-language AND across terms. |
| raw | 1.5 | Query verbatim. |
| regeste_focus | 1.4 | Constrains to the regeste column. |
| nl_or | 1.2 | OR fallback (cost-aware early stop). |
| structured_doctrine | 1.1–3.5 | Doctrine terms from Haiku parse. |
| quoted_explicit | 1.1 | Phrase match when quotes seen. |
| nl_or_expanded | 1.0 | OR + synonym/umlaut/compound expansion. |
| title_focus | 0.95 | Title-column-constrained match. |
| doctrine_regeste / doctrine_title | 2.5 / 1.6 | Concept-translation strategies. |
Beyond FTS5, the same RRF pool also receives:
- Statute-graph candidates — decisions linked to the statute articles parsed from the query, fed in as a parallel ranked list.
- BGE direct lookups — when the query (or its parse) mentions a specific BGE reference, that decision is hard-injected at the top.
- Docket-style hits — strings like
6B_1234/2025are routed to an exact-docket lookup, short-circuiting most of the pipeline.
The fused candidate pool is sized dynamically (default ~300–400 for a top-50 request) and capped at 2,500 rows before reranking begins.
5. Reranking
Each candidate gets a vector of signals; the final score is a linear
combination tuned against the golden set. Signal weights
(_rerank_rows):
| Signal | Weight | Caps / notes |
|---|---|---|
| RRF score | 32.0 | Aggregate of all strategy ranks. |
| Docket exact / partial | 6.0 / 2.0 | String-level docket match. |
| Title coverage | 3.0 | Fraction of claim tokens in title. |
| Regeste coverage | 3.0 | Same, against the Regeste. |
| Statute mentions | 3.5 (base), 0.5/mention, cap 2.0 | Decision-to-article links. |
| Citation hits | 2.4 (base), 0.30/hit, cap 1.2 | Cross-result citation evidence. |
| Authority (incoming citations) | 0.03/citation, cap 1.0 | Why a Leitentscheid floats. |
| Language match | +2.0 | When the result's language matches the query. |
| Expanded coverage | 1.5 / 0.8 | Synonym + compound match credit. |
| Court-domain heuristics | ±0.2 – +1.7 | BVGer asylum + BGer high-court bias when intent matches. |
After the linear scorer, an LLM rerank pass fires when needed:
- Model: Claude Haiku 4.5; top-N: 15; timeout: 3s; weight: 3.0 with
linear decay
w × max(0, 1 − rank/15). - A confidence gate skips the call entirely if the top-1 lexical score is already ≥ 2× the top-2 score (the answer is unambiguous; reranking is pure cost).
- The pass is also skipped on docket-style queries (an exact match needs no LLM).
6. Pinpoint citations live · May 2026
Every top-5 search result and top-3 leading-cases result carries a
pinpoint field naming the specific Erwägung (legal
holding paragraph) most relevant to the query. The resolver runs
over a per-decision FTS5 paragraph index in
decision_structure.db (≈8.8 M paragraphs across 807 K
decisions). The two-pass design:
- Phrase pass. The claim is run as an exact FTS5 phrase — high precision when the user's phrasing matches the court's.
- Bag-of-words OR pass. When the phrase returns nothing, the same tokens are fired as an OR query for broader recall.
A confidence scorer
(_score_pinpoint_confidence) combines three independent
signals to label the result:
- BM25 gap. Rank-1 score over rank-2; ratio > 1.5 admits "high", > 1.2 admits "medium".
- Absolute strength. For single-row matches with no rank-2 to compare, absolute |BM25| > 2.0 for "high", > 1.0 for "medium". A previous version of this branch used a 999.0 sentinel for single-row matches, which silently promoted thin matches to "high"; that was the false-confidence bug fixed in May 2026.
- Token coverage. Distinct claim tokens (> 2 chars, with ~70 generic Swiss-legal-discourse stopwords filtered) that appear in the matched paragraph. Multi-token claims with < 50 % coverage are suppressed entirely; < 70 % is capped at "medium".
Semantic rescue
infrastructure deployed · corpus ~33 % encoded
fires when the lexical pass returns nothing. The claim is encoded
with sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
(117 M params, 384-dim normalized vectors, multilingual DE/FR/IT/RM
plus 47 others); cosine similarity is computed over the decision's
< 300 paragraph embeddings; a match surfaces at "high" if
cosine ≥ 0.70, "medium" if ≥ 0.55, none otherwise. A
hybrid mode, switched on once encoding completes,
runs both signals and treats agreement on the same
Erwägung as cross-signal evidence — confidence boosted to "high",
source: "hybrid_agreement".
The pinpoint URL is structured so the browser auto-scrolls to and
highlights the matched paragraph: ?highlight=<verbatim
sentence>&e=2.3#e-2-3. The query parameter drives a
server-side highlight; the hash fragment fires native browser
auto-scroll.
7. Citation graph
Every decision's text is parsed for citations to other decisions (BGE references, BGer dockets, BVGer references). The raw layer contains 8.65 M edges; the resolver lifts 8.09 M of them (92.9 %) to canonical decision IDs. The remaining 7 % are real decisions we don't yet hold (older BGE pre-1875, withdrawn court publications, OCR-broken refs).
Resolution runs as four cascaded passes:
- Standard docket join. Citation's normalized docket = target's normalized docket. Handles 80%+ of edges.
- BGE prefix bidirectional. Matches whether the
citation includes the
BGEprefix or not, with the target stored in either form. This single fix in March 2026 nearly tripled BGE resolution. - Bare BGE dockets. Targets matching
volume division pageare treated as bare BGE refs and prepended withBGE. - Pin-cite resolution. When a citation contains a
page number not present in our targets (e.g.
BGE 125 V 352where the first page is 351), we find the largest first-page ≤ 352 within the same (volume, division) and within 30 pages. Confidence is penalized by 0.10 to reflect the inference.
Top-cited authority (computed live from
reference_graph.db on 2026-05-11): BGE 125 V 351
with 85,108 incoming citations, followed by
BGE 134 V 231 and BGE 122 V 157 each at the tens-of-thousands order.
These counts feed back into search reranking as the authority
signal (weight 0.03 per citation, capped at 1.0) — which is why
classic Leitentscheide float to the top even when their language
hits are no stronger than other candidates'.
8. Quality assurance
Every nightly publish is gated by a 4-layer QC framework. L4 (the publish gate, runs L1's CRITICAL subset) is the hard blocker: if it fails, the swap is refused and users keep yesterday's corpus until the issue is investigated. The other layers are continuous backstops — L2 fires on every commit in CI, L3 every 5 min on the live server.
- L1 — dataset checks (63 across 20 modules). Per-court drift, duplicate detection, short-text / OCR detection, citation-graph resolution rate, statute-link coverage, date / docket sanity, missing-field counts, plus LLM spot-verification on a rotating sample.
- L2 — pytest suite (516 currently passing). Unit tests cover every parsing, dedup and ranking primitive; new regressions land as new tests via the "incident → regression test" pattern.
- L3 — smoke test (every 5 min). Production
health probe:
/health, an anchor decision page, and the PDF export pipeline. Three consecutive failures escalate to an INVESTIGATE alert. - L4 — publish gate. Of L1's checks, those labelled CRITICAL must pass before the new corpus is committed and pushed; if any fail, the publish refuses to swap and yesterday's data stays live. The dashboard at /quality.html shows each check's last status and history.
On top of QC, decision-quoting LLM workflows route through a
five-rail closing audit
(attest_response) that catches four hallucination
classes — invented citation, invented quote, invented statute
text, invented date — and an optional fifth grounding-judge that
checks whether the proposition the LLM attached to a cited paragraph
is actually supported by that paragraph's text.
9. Tested and rejected
Engineering decisions get more trustworthy when the dead ends are visible. The following techniques were measured on the same golden set as the rest and did not earn their place:
- Cross-encoder reranking
(
cross-encoder/mmarco-mMiniLMv2-L12-H384-v1as the current placeholder) rejected Multilingual cross-encoders trained on generic web QA didn't transfer cleanly to Swiss legal vocabulary on the golden set — hurt MRR rather than helping. The helper is still in the code behind_apply_cross_encoder_boostsfor future fine-tuning experiments but is not invoked in production. - Off-the-shelf BGE-M3 dense vectors
rejected
Encoded the corpus, added vector RRF as a fourth strategy. No
improvement over BM25 + RRF + Haiku at any vector weight. The
vectors.dbwas removed March 2026. The paragraph-level semantic embeddings shipped this week are a separate, narrower intervention — they only score within a single decision's paragraphs, where BM25 has well-known limits. - Larger Haiku top-N rejected Reranking the top-30, top-50 instead of top-15 produced no MRR gain and a linear cost increase. Top-15 is the empirical sweet spot for confidence-gated reranking.
- Larger candidate pool rejected Increasing the pool above ~400 candidates (for default queries) did not move recall meaningfully; the FTS5 + RRF combination already surfaces the right candidates within the first few hundred hits. The 2,500 cap remains as a safety ceiling.
10. Open access
The corpus is CC0 (public domain dedication). The code is MIT, hosted at github.com/jonashertner/caselaw-repo-1 — every weight, threshold and heuristic on this page is auditable in source. No accounts, no cookies, no query logs (privacy policy: /datenschutz/).
Programmatic access:
- REST API. mcp.opencaselaw.ch/docs — OpenAPI 3.0.3 with 47 endpoints (search, get, leading cases, trends, citation graph, statutes, commentaries, Botschaft, exports, pinpoint, attestation).
- MCP server. mcp.opencaselaw.ch/sse — Server-Sent Events transport, exposing 33 tools. Connected by Claude, ChatGPT (OpenAI MCP), Cursor, Microsoft Copilot Studio, Perplexity, and others.
- Hugging Face dataset. voilaj/swiss-caselaw — Parquet shards, daily delta-published.
- Word add-in. opencaselaw.ch/word/ — Office add-in for Microsoft Word; surfaces the same search + pinpoint stack inside the document.
This page reflects the live system. If you spot a discrepancy between what's described here and what the source says, the source wins — open an issue and we'll fix one of them.