opencaselaw.ch

Engineering · transparent

Methodology

How OpenCaseLaw finds, ranks and verifies Swiss court decisions — end-to-end, with every weight and threshold on the record. This page is the canonical reference; anything you read elsewhere should match what's written here, or the code wins.

  1. 1. Corpus & refresh
  2. 2. Full-text index
  3. 3. Query understanding
  4. 4. Retrieval & fusion
  5. 5. Reranking
  6. 6. Pinpoint citations
  7. 7. Citation graph
  8. 8. Quality assurance
  9. 9. Tested and rejected
  10. 10. Open access

1. Corpus & refresh

The corpus holds 971,852 published Swiss decisions across 108 courts, in three languages (DE 449,847; FR 441,288; IT 80,717), spanning 1875 – 2026. Eight federal courts (BGer 175,439; BVGer 92,293; BGE 35,350; plus BStGer, BPatGer, MKG and others), 26 cantonal jurisdictions, and ECHR-Switzerland (~2,800 decisions across BGE-translations, HUDOC, Chamber/Grand-Chamber/Committee).

Each decision is fetched from its primary source — the court's own publication portal where available (all 26 cantons direct-scraped via LexWork / SIL / ZH OpenData / TI RL, with LexFind PDF fallback supplementing 4 cantons for missing laws), entscheidsuche.ch shards for legacy fills, BGer's official search back-end for federal updates. The pipeline refreshes on three schedules:

The full corpus is also published as Parquet on Hugging Face (voilaj/swiss-caselaw) under CC0; the daily delta is appended automatically.

2. Full-text index

SQLite FTS5 with the unicode61 remove_diacritics 2 tokenizer. The decisions table feeds the index across these columns, with column-level BM25 weights tuned empirically against a 100-query golden set:

ColumnBM25 weightWhy
title6.0Highest signal — concise, intentional.
regeste5.5Court's own topical summary.
docket_number2.0For exact-docket recall.
full_text1.2Long, noisy — anchor, not driver.
court / canton / language / decision_id0.8Metadata, not content.

The index is rebuilt atomically: build_fts5.py writes to decisions.db.tmp then os.replace()'s it into place. Workers using the ?immutable=1 SQLite URI keep their open file handles on the old inode until they next reconnect — the swap is invisible to live readers.

build_fts5.py · BM25 weights configured in mcp_server.py around lines 431–442 · diacritic tokenizer mirrored in decision_structure.db for per-paragraph search.

3. Query understanding

A natural-language query never hits FTS5 raw. First it passes through sanitization (_sanitize_fts5): apostrophes, hyphens and dots-without-word-chars collapse to spaces; reserved tokens (AND, OR, NOT, NEAR) are preserved only when they have operands on both sides; the single Swiss-legal token "OR" (Obligationenrecht abbreviation) is always force-quoted because it would otherwise be parsed as a boolean operator.

In parallel, the query is routed to Claude Haiku 4.5 for a 2-second structured parse: statute references, doctrine terms in DE/FR/IT, leading-BGE mentions, synonyms, and legal domain are extracted as deterministic JSON. The output is cached by lowercase query for the session lifetime.

Three normalizations make Swiss legal vocabulary searchable across its stylistic, orthographic and cross-language variants:

4. Retrieval & fusion

The query fans out into 10–12 strategies that run as independent FTS5 / graph queries. Each strategy carries a strategy weight; the rank-1 hit from one strategy and the rank-1 hit from another get fused using Reciprocal Rank Fusion (RRF rank constant = 60).

StrategyWeightEffect
nl_and1.8Natural-language AND across terms.
raw1.5Query verbatim.
regeste_focus1.4Constrains to the regeste column.
nl_or1.2OR fallback (cost-aware early stop).
structured_doctrine1.1–3.5Doctrine terms from Haiku parse.
quoted_explicit1.1Phrase match when quotes seen.
nl_or_expanded1.0OR + synonym/umlaut/compound expansion.
title_focus0.95Title-column-constrained match.
doctrine_regeste / doctrine_title2.5 / 1.6Concept-translation strategies.

Beyond FTS5, the same RRF pool also receives:

The fused candidate pool is sized dynamically (default ~300–400 for a top-50 request) and capped at 2,500 rows before reranking begins.

5. Reranking

Each candidate gets a vector of signals; the final score is a linear combination tuned against the golden set. Signal weights (_rerank_rows):

SignalWeightCaps / notes
RRF score32.0Aggregate of all strategy ranks.
Docket exact / partial6.0 / 2.0String-level docket match.
Title coverage3.0Fraction of claim tokens in title.
Regeste coverage3.0Same, against the Regeste.
Statute mentions3.5 (base), 0.5/mention, cap 2.0Decision-to-article links.
Citation hits2.4 (base), 0.30/hit, cap 1.2Cross-result citation evidence.
Authority (incoming citations)0.03/citation, cap 1.0Why a Leitentscheid floats.
Language match+2.0When the result's language matches the query.
Expanded coverage1.5 / 0.8Synonym + compound match credit.
Court-domain heuristics±0.2 – +1.7BVGer asylum + BGer high-court bias when intent matches.

After the linear scorer, an LLM rerank pass fires when needed:

Measured impact. On the frozen 100-query golden set (74 DE, 16 FR, 7 IT) against 1,078,177 decisions: lexical+RRF alone yields MRR@10 = 0.470, Hit@1 = 0.33, Recall@10 = 0.496. With Haiku reranking enabled, the same online benchmark moves to MRR@10 ≈ 0.647 and Hit@1 ≈ 0.57 — a +37% / +73% lift, at the cost of one extra Haiku call on queries that don't satisfy the confidence gate.

6. Pinpoint citations live · May 2026

Every top-5 search result and top-3 leading-cases result carries a pinpoint field naming the specific Erwägung (legal holding paragraph) most relevant to the query. The resolver runs over a per-decision FTS5 paragraph index in decision_structure.db (≈8.8 M paragraphs across 807 K decisions). The two-pass design:

  1. Phrase pass. The claim is run as an exact FTS5 phrase — high precision when the user's phrasing matches the court's.
  2. Bag-of-words OR pass. When the phrase returns nothing, the same tokens are fired as an OR query for broader recall.

A confidence scorer (_score_pinpoint_confidence) combines three independent signals to label the result:

Semantic rescue infrastructure deployed · corpus ~33 % encoded fires when the lexical pass returns nothing. The claim is encoded with sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 (117 M params, 384-dim normalized vectors, multilingual DE/FR/IT/RM plus 47 others); cosine similarity is computed over the decision's < 300 paragraph embeddings; a match surfaces at "high" if cosine ≥ 0.70, "medium" if ≥ 0.55, none otherwise. A hybrid mode, switched on once encoding completes, runs both signals and treats agreement on the same Erwägung as cross-signal evidence — confidence boosted to "high", source: "hybrid_agreement".

The pinpoint URL is structured so the browser auto-scrolls to and highlights the matched paragraph: ?highlight=<verbatim sentence>&e=2.3#e-2-3. The query parameter drives a server-side highlight; the hash fragment fires native browser auto-scroll.

7. Citation graph

Every decision's text is parsed for citations to other decisions (BGE references, BGer dockets, BVGer references). The raw layer contains 8.65 M edges; the resolver lifts 8.09 M of them (92.9 %) to canonical decision IDs. The remaining 7 % are real decisions we don't yet hold (older BGE pre-1875, withdrawn court publications, OCR-broken refs).

Resolution runs as four cascaded passes:

  1. Standard docket join. Citation's normalized docket = target's normalized docket. Handles 80%+ of edges.
  2. BGE prefix bidirectional. Matches whether the citation includes the BGE prefix or not, with the target stored in either form. This single fix in March 2026 nearly tripled BGE resolution.
  3. Bare BGE dockets. Targets matching volume division page are treated as bare BGE refs and prepended with BGE .
  4. Pin-cite resolution. When a citation contains a page number not present in our targets (e.g. BGE 125 V 352 where the first page is 351), we find the largest first-page ≤ 352 within the same (volume, division) and within 30 pages. Confidence is penalized by 0.10 to reflect the inference.

Top-cited authority (computed live from reference_graph.db on 2026-05-11): BGE 125 V 351 with 85,108 incoming citations, followed by BGE 134 V 231 and BGE 122 V 157 each at the tens-of-thousands order. These counts feed back into search reranking as the authority signal (weight 0.03 per citation, capped at 1.0) — which is why classic Leitentscheide float to the top even when their language hits are no stronger than other candidates'.

8. Quality assurance

Every nightly publish is gated by a 4-layer QC framework. L4 (the publish gate, runs L1's CRITICAL subset) is the hard blocker: if it fails, the swap is refused and users keep yesterday's corpus until the issue is investigated. The other layers are continuous backstops — L2 fires on every commit in CI, L3 every 5 min on the live server.

On top of QC, decision-quoting LLM workflows route through a five-rail closing audit (attest_response) that catches four hallucination classes — invented citation, invented quote, invented statute text, invented date — and an optional fifth grounding-judge that checks whether the proposition the LLM attached to a cited paragraph is actually supported by that paragraph's text.

9. Tested and rejected

Engineering decisions get more trustworthy when the dead ends are visible. The following techniques were measured on the same golden set as the rest and did not earn their place:

10. Open access

The corpus is CC0 (public domain dedication). The code is MIT, hosted at github.com/jonashertner/caselaw-repo-1 — every weight, threshold and heuristic on this page is auditable in source. No accounts, no cookies, no query logs (privacy policy: /datenschutz/).

Programmatic access:

This page reflects the live system. If you spot a discrepancy between what's described here and what the source says, the source wins — open an issue and we'll fix one of them.