LIVE · evidence chain anchored · crypto verified · APAC edge

AI 評測,你能自己驗算

AI benchmark you can audit yourself.

Every score on GetAI is anchored to a public Merkle root. Pull any bundle, recompute the SHA-256, walk the proof to the daily root — no GetAI infrastructure needed. Trust nothing. Verify everything.

verified SHA-256 manifest account_tree Daily Merkle root lock Per-workspace RLS bolt Edge runtime verify
runs
evidence bundles
artifact blobs
postgres tables
RLS policies
infra spend / mo
$0
How it works

One pipeline. Five proofs.

Every model invocation that lands on the leaderboard travels the same path. Each step is independently verifiable; we publish the cryptographic glue between them so you don't have to take our word for anything.

  1. terminal step 1

    Sandboxed call

    Deterministic params, captured headers, header-hash baseline.

  2. grading step 2

    Predicate eval

    8-axis scorers · single judge (Gemini 2.5 Flash) · provisional.

  3. inventory_2 step 3

    Evidence bundle

    Canonical orjson, SHA-256 manifest, signatures.

  4. account_tree step 4

    Merkle anchor

    Daily root published 00:00 UTC, RFC 6962-style tree.

  5. verified_user step 5

    Public verify

    CLI, edge function, third party — same answer.

Why GetAI

Built for procurement-grade decisions.

Most AI benchmarks publish a number. GetAI publishes the number, the prompt, the response bytes, the judge verdicts, the cost snapshot, and a cryptographic proof you can replay six months from now.

workspaces

Tenant-private eval · roadmap

Planned. Distill a customer's support tickets into a private benchmark pack — NDA-bound, RLS-isolated, never on the public board. Built when the first enterprise inquiry lands.

radar

Drift & silent-update surfacing

Live: score drift detection (CUSUM + Page-Hinkley + MAD-z + Mann-Whitney with BH correction) runs after each weekly full-pack run. Roadmap: header-hash and fingerprint probes for model swaps that do not show up as score changes.

hub

Verifiable evidence chain

SHA-256 content-addressable storage + daily Merkle root + envelope encryption + GDPR tombstones. Every byte accountable.

translate

繁中 vertical packs

Live: tw-ai-thinking-v1 (44 tasks, 5 rubrics). Next: tw-invoice-ops-v0.1 (statutory uniform-invoice operations). Additional verticals (健保勞保、客服理賠、法遵) join as demand signals accumulate — not translated MMLU.

Live leaderboard

Phase 0 · single-judge, provisional

Every candidate below is scored by the same single judge (Gemini 2.5 Flash) against the same task set; every score is explicitly flagged provisional. Use the board as a relative signal, not a final ranking. A second judge joins as soon as GetAI reaches its first paying-customer threshold — see the methodology page for the full roadmap.

#
Vendor
Model
Score
Bundles
Last seen
Status
Audit this leaderboard yourself

Every score carries a Merkle-root proof.1

No GetAI login, no API key, no vendor call required. Pull any Evidence Bundle2 from the CDN and recompute its SHA-256 locally — the three commands below run on any Unix terminal.

terminal · recompute SHA-256 three lines · copy-paste
# 1. Pull the Evidence Bundle ZIP (swap <id> for any bundle ID on the board)
curl -sLO "https://getai.getinfo.com.tw/api/bundle?id=<id>&format=zip"

# 2. Recompute the SHA-256 of the canonical manifest inside the bundle
unzip -p <id>.zip manifest.json | openssl dgst -sha256

# 3. Read the expected hash from SIGNATURES.json and compare — they must match
unzip -p <id>.zip SIGNATURES.json | jq -r .manifest_sha256
Full verification walkthrough arrow_forward Commands are illustrative. Exact endpoint parameters are documented at /verify.

1Merkle root — a single SHA-256 hash that summarises a tree of hashes. Anchoring every bundle to a daily-published Merkle root means tampering with one bundle breaks the entire chain.

2Evidence Bundle — the content-addressable archive for a single evaluation trial. It carries the input prompt, the model's response, the judge verdict, the final scores, and the Merkle proof — in canonical JSON and NDJSON formats.

Launch-Gate 12.7

14 days of consecutive Merkle roots.

One of the hard pre-GA gates: every day for 14 consecutive days the daily Merkle root must be published and resolvable. Each green cell is a day with at least one bundle anchored.

Trailing 14 days · UTC
/ 14 consecutive days
day with anchored bundle no bundle
Evidence stream

Every bundle, downloadable, replay-verified.

10 most recent bundles in the chain. Click any row to see the full integrity check run live at the edge — Cloudflare fetches the ZIP from R2, recomputes the SHA-256, and reports verified / tampered / missing.

Support the work

One coffee keeps the pipeline running.

Infrastructure for the evidence chain is open-source friendly and cheap, but the weekly judge run still costs real money. If GetAI is useful to you, a small one-time tip keeps it going.

local_cafe Support GetAI Third-party checkout (Ko-fi) · opens in a new tab

Contribute US$5 or more and reply to the Ko-fi receipt with a short note, and we'll email you a full Evidence Bundle access code — the same content-addressable archive third parties use to recompute any score. Fulfillment is manual for now; Perry aims to send codes within 48 hours of the donation notification.