World Cup Forecast Lab · Methodology

Section 1

Statistical baseline

The forecast starts with a Poisson regression on every international match played since the early 1990s — 47 570 matches as of 2026-06-02, refit as new results come in. Each team has two latent skills: an attack rating (its tendency to score) and a defense rating (its tendency to concede). Matches are weighted by competition importance — World Cup finals carry far more signal than friendlies, on a multiplier scale adopted from Nate Silver's PELE methodology — and by recency, with a half-life of six years.

For each future fixture the model produces a pair of expected goal rates (λ_home, λ_away). These rates fully determine the win / draw / loss probabilities you see throughout the site.

The model structure itself (independent Poissons with team-level attack/defense parameters, fit by penalised maximum likelihood) is a standard form going back to Maher (1982) and Dixon-Coles (1997).

Section 2

Simulation and uncertainty

A single match has a probability distribution; a 64-match tournament has a much wider one. To estimate champion odds we run 20,000 Monte Carlo tournaments, drawing each match's goals from its Poisson distribution and advancing teams through the group stage (with FIFA Annex C tiebreakers), the round of 32, and the knockout bracket.

Team strengths are themselves uncertain — fit on a finite sample of matches, with skill that drifts over time. A match-resampling bootstrap (≈50 replicates) gives a probability band on each team's champion odds, not just a point estimate. The band is why each champion number on the site comes with a range around it, not just a point.

Section 3

Model vs market

The /vs-market page compares the model's probabilities head-to-head with prices implied by public markets. Markets aggregate information the model cannot see; the model is calibrated to historical results in ways markets are not. Neither is treated as ground truth.

Where they agree, a forecast has two independent sources behind it. Where they diverge, the gap marks a match worth a closer look — which is where the analyst comes in (section 6).

Section 4

Public calibration

A forecast that says "70%" should be right about 70% of the time. We test this with a walk-forward backtest over 2,117 competitive matches across eight folds, and publish the reliability diagram and scoring rules — ranked probability score (RPS) and log loss — on the /calibration page, with the underlying numbers, not only the chart.

Section 5

Source registry

Every external input — the historical results dataset, FIFA ranking snapshots, club-strength inputs, market-price feeds — is recorded in the source registry at /sources with a fetch date and an integrity hash. No hidden inputs. If a number on the site changes, the registry shows what changed and when.

Section 6

The analyst

The statistical model cannot see things that decide matches: injuries, lineup choices, suspensions, recent club-level form, manager changes, tactical shifts. Markets usually can. The largest model-vs-market gaps are where that difference shows up.

The analyst is an LLM-based pipeline that writes sourced, structured explanations of those gaps. For each match where the gap is meaningful, it produces a short note in three parts:

What the model weights heavily — drawn from the model's actual inputs.
What the model cannot see — structured signals (injuries, lineup news, club form) collected from defined sources, with citations.
What the market may be weighting differently — Polymarket price context and a sourced reading of the move.

The analyst writes explanations, not forecasts. Probabilities shown here come from the statistical model. Context is sourced and dated; it does not silently move any number.

Section 7

Post-tournament scoring

After the final, we publish per-match and tournament-level scoring of every probability we issued: RPS, log loss, and calibration against the realised outcomes.

The analyst's notes get a separate, qualitative review — whether each note called out the factors that actually decided the match. This is not built yet; it ships after the tournament, once there are resolved matches to review against.

About this work

This site demonstrates one capability: structured, sourced explanation of disagreement between a quantitative baseline and an external signal.

Many forecasting settings have both — a statistical model (actuarial, demand, credit, underwriting) and a parallel external signal (peer benchmark, competitor pricing, consensus estimate, market price). When they diverge, the explanation is usually reconstructed by hand, case by case, with no durable record of what drove it. The pattern shown here — baseline plus a sourced note on the gap — is the same one that applies in those settings. The World Cup makes it concrete.

How this forecastis built