Calibration
Calibration measures whether predicted probabilities match observed frequencies over many forecasts.
Definition
Calibration describes how well predicted probabilities match real world frequencies. If you label events as 70 percent likely, then about 70 percent of them should happen over a large sample.
Why it matters
Calibration is the difference between confidence and accuracy. A forecaster can be directionally correct but poorly calibrated, meaning probabilities are systematically too high or too low. Calibration issues can also explain why a user feels consistently surprised by outcomes.
How to check calibration
• Group forecasts into probability bins, such as 10 percent ranges.
• Compare average predicted probability to the observed frequency in each bin.
• Track trends over time.
Common pitfalls
Small samples: Calibration needs many forecasts to be stable.
Mixing categories: Sports, politics, and macro events can behave differently.
Learn more
For calibration tables and batch evaluation workflows, see BrierScore.com.