Every score on GetAI is anchored to a public Merkle root. Pull any bundle, recompute the SHA-256, walk the proof to the daily root — no GetAI infrastructure needed. Trust nothing. Verify everything.
Every model invocation that lands on the leaderboard travels the same path. Each step is independently verifiable; we publish the cryptographic glue between them so you don't have to take our word for anything.
Deterministic params, captured headers, header-hash baseline.
8-axis scorers + 3-judge ensemble (Phase 1).
Canonical orjson, SHA-256 manifest, signatures.
Daily root published 00:00 UTC, RFC 6962-style tree.
CLI, edge function, third party — same answer.
Most AI benchmarks publish a number. GetAI publishes the number, the prompt, the response bytes, the judge verdicts, the cost snapshot, and a cryptographic proof you can replay six months from now.
Distill your support tickets into a private benchmark pack in 48 hours. NDA-bound, RLS-isolated, never on the public board.
2-of-3 fusion (header hash + fingerprint cosine + vendor notes) catches model swaps your dashboard misses for weeks.
SHA-256 content-addressable storage + daily Merkle root + envelope encryption + GDPR tombstones. Every byte accountable.
Real Taiwan workloads — 發票 OCR · 健保勞保公文 · 客服理賠 · 法遵 — not translated MMLU.
One model is currently being measured against the tw-coding-daily-v1 smoke pack. The queued rows below ship in Phase 1 (Q3 2026) under the D8 three-judge ensemble. Phase 0 scores are provisional and not eligible for public ranking until then.
One of the hard pre-GA gates: every day for 14 consecutive days the daily Merkle root must be published and resolvable. Each green cell is a day with at least one bundle anchored.
10 most recent bundles in the chain. Click any row to see the full integrity check run live at the edge — Cloudflare fetches the ZIP from R2, recomputes the SHA-256, and reports verified / tampered / missing.