Agent Lab — a live forecast exhibit · Applied AI
Scorecard

How good is good?

A forecast is only honest if it's scored. We publish every number ahead of kickoff and grade it once the match is played — model and agent both. The hard part is knowing what a good score even is, so we anchor it explicitly.

The floor is a naive prior: always predict the long-run base rate. Beating it is the minimum bar. The sharp market is the practical ceiling — hard to beat once its odds are converted to fair probabilities. The model is judged by where it lands between the two; the agent is judged only by whether its calls improve the model's score. Why soccer is hard →

The model, against its benchmarks

4 matches scored
Floor · naive prior
0.582
Brier · 75.0% top-pick

Always predict the base rate. The minimum bar.

Our model
0.580
Brier · 75.0% top-pick

Beating the floor — lower Brier is better.

Sharp market · benchmark
Per-match prices pending

We publish the model beside each venue and score them together as results land. Model vs market →

The full record

The agent, scored

All calls →
72/72
Matches analysed
1
Picks overturned
4
Calls graded
75.0%
Agent top-pick

Over 4 graded calls, the agent's top pick was right 75.0% of the time. Each call also carries a signed score delta versus the model — whether that specific move helped or hurt — on its match page. Its biggest nudge so far: France v Iraq, +12.1 pp.