Scorecard

How good is good?

A forecast is only honest if it's scored. We publish every number ahead of kickoff and grade it once the match is played — model and agent both. The hard part is knowing what a good score even is, so we anchor it explicitly.

The floor is a naive prior: always predict the long-run base rate. Beating it is the minimum bar. The sharp market is the practical ceiling — hard to beat once its odds are converted to fair probabilities. The model is judged by where it lands between the two; the agent is judged only by whether its calls improve the model's score. Why soccer is hard →

The model, against its benchmarks

4 matches scored

Floor · naive prior

0.582

Brier · 75.0% top-pick

Always predict the base rate. The minimum bar.

Our model

0.580

Brier · 75.0% top-pick

Beating the floor — lower Brier is better.

Sharp market · benchmark

—

Per-match prices pending

We publish the model beside each venue and score them together as results land. Model vs market →

The full record

Calibration

When the model says 60%, does it happen 60% of the time? Reliability bins from the walk-forward backtest.

Track record

Champion odds as they moved across runs, and the live Brier (0.580 vs floor 0.582) over scored matches.

Model vs market

The model beside Polymarket and Kalshi, each de-vigged separately — never pooled into a consensus.

Agent scorecard

Does the agent's reading of live context improve on the model? Every move, scored once the match is played.

The agent, scored

All calls →

72/72

Matches analysed

Picks overturned

Calls graded

75.0%

Agent top-pick

Over 4 graded calls, the agent's top pick was right 75.0% of the time. Each call also carries a signed score delta versus the model — whether that specific move helped or hurt — on its match page. Its biggest nudge so far: France v Iraq, +12.1 pp.