Prediction Engine — Improvements & Methodology

Overview

The FIFA World Cup 2026 prediction engine (tools/predict.py) uses Elo ratings, Poisson goal modeling, and Dixon-Coles low-score corrections to generate match-by-match scoreline predictions and Monte Carlo tournament simulations. This document describes the data enrichment and engine calibration performed in June 2026 to improve prediction accuracy.

1. Data Enrichment: From 5 Columns to 15

Before (v1 — minimal)

The original tools/ratings.csv contained only 5 features per team:

Column	Description
`fifa_rank`	FIFA world ranking
`elo`	Elo rating (long-run strength)
`attack_avg`	Average goals scored per match
`defense_avg`	Average goals conceded per match
`host`	Whether the team is a co-host (0/1)

These are the minimum inputs for a Poisson model — sufficient to compute a scoreline probability matrix, but blind to current context. A team's Elo reflects years of results, not whether their star striker tore an ACL last month.

After (v2 — enriched)

Ten contextual columns were added, sourced from ESPN, FIFA.com, eloratings.net, club data (Premier League, La Liga, Serie A, Bundesliga, Ligue 1), and official national team announcements:

Column	Range	What it captures	Example
`form`	-1.0 to +1.0	Recent 10-match trajectory	Spain 0.90 (Euro 2024 champs), Belgium 0.25 (aging, inconsistent)
`injury_impact`	0.0 – 0.30	Key player unavailability	Canada 0.25 (Alphonso Davies ACL), France 0.10 (rotation concerns)
`squad_depth`	1 – 10	Quality beyond the starting XI	France 10 (can field two XI), Haiti 2 (no bench depth)
`avg_age`	years	Mean squad age	Croatia 32.0 (oldest), Spain 25.5 (youngest contender)
`wc_experience`	count	Total World Cup tournament appearances	Brazil 22 (most experienced), Jordan 0 (debut)
`set_piece_off`	1 – 10	Set-piece attacking threat	France 9 (Griezmann delivery), Qatar 4
`set_piece_def`	1 – 10	Set-piece defensive solidity	Morocco 9 (Regragui's structure), Qatar 4
`pressure_rating`	1 – 10	Composure in high-stakes / penalty situations	Argentina 10 (2022 final pens), England 5 (historical penalty woes)
`coach_rating`	1 – 10	Head coach tactical quality	Ancelotti/Brazil 10, Bielsa/Uruguay 9
`fatigue`	0.0 – 1.0	End-of-season squad tiredness	England 0.35 (EPL + CL grind), Qatar 0.10 (lighter domestic league)

Data Quality

Sources: All values cross-referenced against at least two independent sources (FIFA/UEFA official records, club performance data, sports analytics sites).
Calibration round: After initial assignment, 8 values were flagged and adjusted in an audit pass:

Team	Column	Initial → Corrected	Reason
Belgium	form	0.10 → 0.25	0.10 equated them to Qatar/Haiti; they qualified for the WC
France	form	0.50 → 0.65	2022 finalist with Mbappé at peak; 0.50 undervalued
Croatia	form	0.20 → 0.35	2022 bronze medalists, not in free-fall
Brazil	form	0.30 → 0.40	Ancelotti appointment signals tactical uplift
Sweden	attack_avg	1.8 → 1.5	Was equal to Netherlands/England — Isak alone doesn't justify that
Norway	attack_avg	1.8 → 1.6	Haaland-inflated; Norway as a team aren't Netherlands-level
Morocco	attack_avg	1.2 → 1.4	2022 semifinalist was undervalued offensively
Ivory Coast	defense_avg	0.7 → 0.85	Was equal to Spain/Argentina — too generous for AFCON champions

Remaining caveats: form and injury_impact are the most volatile columns — they reflect a snapshot at tournament start (June 2026) and would need updating if a key player is injured during the competition.

2. Engine Improvements

2.1 Injury Impact Scaling (Critical Fix)

Problem: The original code applied injury_impact as a direct λ multiplier:


lam *= (1.0 - min(inj, 0.30))   # Canada (0.25) lost 25% of expected goals!

One injured player reducing a team's expected goals by 25% is unrealistic — even losing Messi doesn't halve Argentina's output.

Fix: Injury is now scaled by INJURY_WEIGHT = 0.40 and further reduced by squad_depth (deep benches mitigate injuries):


depth_reduction = max(0.0, (depth - DEPTH_MEAN) * DEPTH_INJ_MOD)
effective_inj = inj * INJURY_WEIGHT * (1.0 - depth_reduction)
lam *= (1.0 - min(effective_inj, 0.15))  # capped at 15%

Impact: Canada's injury penalty dropped from 25% → ~8%. France (squad_depth=10) with injury=0.10 now loses only ~3% instead of 10%.

2.2 Form Weight Increase

Problem: FORM_WEIGHT = 0.12 was too small. The gap between Germany on fire (form=0.70, +8.4%) and Belgium in crisis (form=0.10, +1.2%) was only 7.2% — barely distinguishable.

Fix: Raised to FORM_WEIGHT = 0.15. The same gap is now 9.0%, and the model correctly separates teams in good vs poor form.

2.3 Coach Rating Effect

Problem: coach_rating was loaded but unused. Brazil under Ancelotti (10) played identically to Haiti under an unknown coach (5).

Fix: Coach rating now nudges λ by ±3% per point above/below the mean (7.0):


lam *= (1.0 + (coach - COACH_MEAN) * COACH_WEIGHT)

Brazil under Ancelotti gets +9% λ boost. A team with a weak coach (rating 5) gets -6%.

2.4 Squad Age Penalty

Problem: avg_age was loaded but unused. Croatia (32.0) and Portugal (30.5) had no endurance penalty despite fielding the oldest squads.

Fix: Squads older than 27.0 receive a λ penalty of 1.2% per year above the mean. Very young squads (< 25.5) receive a smaller inexperience penalty:


if age_dev > 0:
    lam *= (1.0 - age_dev * AGE_WEIGHT)       # Croatia: -6.0%
elif age_dev < -1.5:
    lam *= (1.0 + age_dev * AGE_WEIGHT * 0.5)  # mild youth penalty

Impact: Croatia's λ drops 6.0% (age 32.0), Bosnia 3.6% (30.0), Portugal 4.2% (30.5). This correctly reflects late-tournament stamina disadvantages for aging squads.

2.5 Draw Calibration for Group Stage

Problem: The deterministic engine always picked the modal outcome. In a 39/30/31 match (Belgium vs Iran), 39% > 30%, so Belgium always won. Result: 0 draws across 72 group matches. Real World Cups have ~22% draw rate.

Fix: Added DRAW_PULL = 0.15 — when the best win probability leads the draw by less than 15 percentage points, predict a draw:


if (best_win - pd) < DRAW_PULL:
    outcome = "draw"

Impact: 12 draws in 72 matches (17% draw rate), close to the historical WC rate of ~22%. Groups now produce realistic standings like 7-5-4-0 instead of uniform 9-6-3-0.

2.6 Knockout Penalty Shootout Model

Problem: Penalty outcomes were determined by pure Elo win-expectancy. A team's tournament experience and composure under pressure had no effect.

Fix: Penalty probability now blends three factors:


pa = (0.65 * elo_pa           # base strength
    + 0.25 * pres_pa           # pressure_rating (Argentina 10, England 5)
    + 0.10 * exp_pa)           # wc_experience (Brazil 22, Jordan 0)

Impact: Argentina (pressure=10, experience=18) wins penalty tiebreakers far more often than England (pressure=5, experience=16), matching historical patterns.

3. Prediction Robustness

Title Odds Comparison (10,000 Monte Carlo simulations)

Team	V1 (5 columns)	V3 (15 columns, calibrated)	Bookmaker consensus
Argentina	16.6%	16.8%	13–18% ✅
Spain	14.4%	15.0%	10–15% ✅
France	7.1%	7.9%	10–15% ⚠️ slightly low
Germany	6.4%	6.3%	5–8% ✅
Brazil	3.9%	5.7%	6–10% ✅ (was too low)
England	4.2%	4.8%	7–10% ⚠️ slightly low
Colombia	6.0%	5.9%	3–6% ✅
Morocco	2.8%	3.8%	2–4% ✅
Croatia	1.6%	1.3%	2–4% ✅ (age penalty)

Key improvements:

Brazil rose 3.9% → 5.7% (form correction + Ancelotti coach bonus)
Morocco rose 2.8% → 3.8% (attack_avg corrected)
Croatia dropped 1.6% → 1.3% (age penalty for squad averaging 32.0 years)
France rose 7.1% → 7.9% (form correction; remaining gap reflects real injury/fatigue risks)
England at 4.8% reflects their pressure_rating=5 — the model correctly captures their historical knockout fragility

Group Stage Realism

Metric	V1 (before)	V3 (after)	Real WC benchmark
Groups with 9-point winner	12/12 (100%)	7/12 (58%)	~3/8 at 2022 (38%)
Total draws in 72 matches	0 (0%)	12 (17%)	~16 in 48 at 2022 (22%)
Distinct point distributions	1 pattern (9-6-3-0)	5 patterns	Typical: 6–8 patterns
Upsets (lower-Elo team wins)	0	1 (Egypt tops Group G)	5–8 per WC

Sensitivity Analysis

The nudge factors are designed to be meaningful but not dominant. Each factor's maximum impact on λ:

Factor	Max λ change	Scenario
Form	±15%	Spain (0.90) vs Qatar (0.10)
Injury	-15% (capped)	Canada (0.25 raw, scaled to ~8% with depth)
Fatigue	-6%	England (0.35)
Coach	±9%	Ancelotti (10) vs unknown (5)
Age	-6%	Croatia (avg 32.0)
Set-piece	±4%	Elite vs weak set-piece team
Combined max	~30%	Worst case: poor form + injured + fatigued + old + weak coach

The 30% ceiling means even a maximally penalized top team (Elo 2100) doesn't drop below a mid-tier team (Elo 1800). Elo remains the backbone; nudges provide contextual adjustment, not overrides.

4. Workflow

Running predictions


# Single match
python tools/predict.py --match "Argentina vs Jamaica"

# Knockout match (forces a winner, uses penalty model)
python tools/predict.py --match "France vs England" --knockout

# Full group table with standings
python tools/predict.py --group C

# Monte Carlo title odds (10,000 seeded simulations)
python tools/predict.py --sim 10000

# Regenerate all data files (GroupStage.md + Simulation.md)
python tools/predict.py --all

Updating data

1. Edit tools/ratings.csv to change team ratings or contextual factors
2. Model constants are at the top of tools/predict.py (BASE_GOALS, ALPHA, RHO, FORM_WEIGHT, etc.)
3. Run python tools/predict.py --sim 10000 to verify title odds remain reasonable
4. Run python tools/predict.py --all to regenerate prediction files

Key constants

Constant	Value	Purpose
`BASE_GOALS`	1.35	Half of neutral match total (~2.70 goals)
`ALPHA`	0.10	Elo-to-λ sensitivity
`RHO`	-0.08	Dixon-Coles low-score correction
`HOME_ELO_BONUS`	70.0	Host nation Elo boost
`FORM_WEIGHT`	0.15	Form → λ multiplier
`INJURY_WEIGHT`	0.40	Raw injury scaling factor
`FATIGUE_WEIGHT`	0.06	Fatigue → λ reduction
`COACH_WEIGHT`	0.03	Coach quality → λ multiplier
`AGE_WEIGHT`	0.012	Squad age → λ penalty (per year above 27)
`DRAW_PULL`	0.15	Threshold for predicting draws in close matches
`PRESSURE_PENS`	0.25	Pressure rating weight in penalty shootouts
`SEED`	2026	Fixed RNG seed for reproducibility

Pipeline flow


tools/ratings.csv (15 columns × 48 teams)
    │
    ├─► Elo + attack/defense → base λ_home, λ_away
    │       (55% actual stats / 45% Elo-derived)
    │
    ├─► Form/injury/fatigue/coach/age/set-piece nudges
    │       (multiplicative adjustments, ±30% max combined)
    │
    ├─► Poisson + Dixon-Coles → 9×9 scoreline probability matrix
    │       (low-score dependency correction via ρ)
    │
    ├─► Draw calibration (DRAW_PULL) → group stage outcome
    │       (close matches → draw instead of marginal win)
    │
    └─► Monte Carlo (10,000 sims, seeded) → title odds
            (penalty shootouts blend Elo + pressure + experience)

5. Known Limitations

1. France (7.9%) and England (4.8%) are below bookmaker consensus. This is a deliberate modeling choice — France's fatigue/injury risk and England's penalty fragility are real factors that bookmakers may underweight.

2. Draw rate (17%) is still below real WC rate (~22%). The DRAW_PULL threshold could be raised further, but this risks over-predicting draws in matches where a clear favorite exists.

3. No in-tournament adaptation. The model cannot adjust for events during the World Cup (red cards, injuries sustained during the tournament, momentum shifts). All predictions are pre-tournament.

4. Contextual data is a snapshot. form, injury_impact, and fatigue reflect the state as of June 2026 tournament start. These would need manual updates if significant news breaks before kickoff.

5. Single-elimination bracket seeding uses Elo-ranking order, not the actual FIFA bracket structure (group winners vs runners-up crossovers). The Monte Carlo simulation approximates but doesn't replicate the exact knockout draw path.

*Engine: tools/predict.py (pure Python stdlib, no external dependencies)*

*Data: tools/ratings.csv (15 columns × 48 teams)*

*Last calibrated: June 10, 2026*

Prediction Methodology