How good is good?
A forecast is only honest if it's scored. We publish every number ahead of kickoff and grade it once the match is played — model and agent both. The hard part is knowing what a good score even is, so we anchor it explicitly.
The floor is a naive prior: always predict the long-run base rate. Beating it is the minimum bar. The sharp market is the practical ceiling — hard to beat once its odds are converted to fair probabilities. The model is judged by where it lands between the two; the agent is judged only by whether its calls improve the model's score. Why soccer is hard →
The model, against its benchmarks
4 matches scoredAlways predict the base rate. The minimum bar.
Beating the floor — lower Brier is better.
We publish the model beside each venue and score them together as results land. Model vs market →
The full record
When the model says 60%, does it happen 60% of the time? Reliability bins from the walk-forward backtest.
Champion odds as they moved across runs, and the live Brier (0.580 vs floor 0.582) over scored matches.
The model beside Polymarket and Kalshi, each de-vigged separately — never pooled into a consensus.
Does the agent's reading of live context improve on the model? Every move, scored once the match is played.
The agent, scored
All calls →Over 4 graded calls, the agent's top pick was right 75.0% of the time. Each call also carries a signed score delta versus the model — whether that specific move helped or hurt — on its match page. Its biggest nudge so far: France v Iraq, +12.1 pp.