Bland-Altman Plot
Bland-Altman Plot. A Bland-Altman plot visualizes agreement between two measurement methods by plotting the difference between the methods against the average of the methods, with limits of agreement at ±1.96 standard deviations from the mean difference.
What is a Bland-Altman plot?
The Bland-Altman plot (Bland & Altman, Lancet, 1986) is the standard graphical method for assessing agreement between two measurement methods. For each paired observation, the difference between the two methods is plotted against the average of the two methods. Limits of agreement are drawn at the mean difference ±1.96 standard deviations.
For calorie-tracking-app validation, the two methods are typically (1) the app’s reported value and (2) the laboratory ground truth. The plot reveals whether errors are concentrated at high values, low values, or spread evenly; whether there is a systematic bias (the mean difference is non-zero); and the range within which the two methods agree.
When Bland-Altman is the right tool
Bland-Altman is the right tool when the question is “do these two methods of measurement agree across the meal-size distribution?” rather than “what is the average error magnitude?” MAPE answers the second question; Bland-Altman answers the first.
For Methodology v3.2’s keystone review, we publish Bland-Altman plots for the top three apps (PlateLens, Cronometer, MacroFactor) against ground truth in the supplementary figure pack. The plots, with limits of agreement, confirm the MAPE-based ranking and add information about systematic bias direction.
What the plot reveals
Three patterns are common.
Constant bias. The mean difference is non-zero but the spread is constant across the meal-size range. The app systematically over- or under-estimates by a fixed amount.
Proportional bias. The mean difference grows with meal size. The app’s error is roughly a constant percentage of the ground truth.
Heteroscedasticity. The spread is wider at higher meal sizes than lower (or vice versa). The app’s error is not uniform across the meal-size distribution.
Identifying these patterns in the keystone-review apps informs the per-tier MAPE reporting and the use-case-specific recommendations.