Calibration: What "Good Probabilities" Actually Means
What calibration is
Calibration means your probabilities match reality over time. If you say 70 percent often, then about 70 percent of those outcomes should happen.
This is not about one forecast. It is about patterns across many forecasts. Calibration is a property of your forecasting process.
Why calibration matters
In prediction markets, you trade against implied probability. If your predicted probability is miscalibrated, your edge is often imaginary.
Miscalibration shows up as:
• Paying too much for contracts you believe are underpriced.
• Taking trades that do not clear break even probability after fees and execution costs.
• Over sizing because you think outcomes are more certain than they are.
Calibration vs sharpness
Two terms matter together:
• Calibration: do your stated probabilities match frequencies?
• Sharpness: do you make informative probabilities, or does everything drift toward 50 percent?
You can be sharp and wrong (overconfident). You can be safe and useless (underconfident). The goal is to be both calibrated and sharp.
The simplest calibration test you can run
Take your last N forecasts and group them into bins. For example:
• 0.55 to 0.60
• 0.60 to 0.65
• 0.65 to 0.70
• 0.70 to 0.75
For each bin, compute:
• Average predicted probability
• Observed frequency (percent that happened)
If the 0.70 to 0.75 bin averages 0.72, then about 72 percent should happen. If only 55 percent happen, you are overconfident in that region.
Calibration error and reliability
Calibration error is a way to quantify how far your predicted probabilities are from observed frequencies across bins. Closely related is reliability, which is the idea that stated probabilities should be trustworthy.
Common patterns:
• Overconfidence curve: high probabilities happen less often than you claim.
• Underconfidence curve: you avoid strong probabilities and your high confidence bins are too conservative.
• Regime dependence: you are calibrated in stable environments but break around news and volatility.
How scoring rules reward calibration
Calibration is measured and rewarded by proper scoring rules.
Brier score
Brier score is a common metric for binary forecasts. Lower is better. It rewards honest probabilities over time.
If you want a dedicated calculator and explanation, see BrierScore.com.
Log loss
Log loss punishes extreme probabilities when you are wrong. This is why 95 percent calls that fail are so damaging.
Calibration and trading decisions
Even if you do not care about scoring, calibration matters for money:
• Miscalibration breaks your fair price threshold.
• It inflates perceived expected value.
• It causes bad decisions around costs like trading fee and spread.
In prediction markets, small edges are fragile. If your probabilities are off by a few points, costs can flip net EV.
Practical ways to improve calibration
• Use base rates: anchor forecasts with a base rate and treat it as your starting prior probability.
• Update in proportion to evidence: avoid big swings based on narrative. If needed, use Bayes theorem thinking and likelihood ratio language.
• Track bins monthly: review calibration with the bin method above.
• Control extremes: extreme probabilities should require extreme evidence.
Takeaway
Good probabilities are not the ones that sound confident. They are the ones that match reality when repeated. Calibration is the bridge between forecasting and trading. If you want edge that survives costs, you need predicted probabilities that have earned trust through measurement.
Related
• Predicted Probability: How to Build a Forecast You Can Trust
• Confidence vs Probability: The Fastest Way to Get Miscalibrated
• Bayes for Humans: Updating with Odds and Likelihood Ratios