Methodology v3.2 accuracy

MAPE vs MAE vs MAD: Choosing the Right Calorie Accuracy Metric

Why this publication uses MAPE for headline figures, where MAE and MAD provide complementary information, and how to read each metric without overinterpreting.

By Tomás Filipovic-Reyes, PhD, MSc — Senior Scientist · Published November 4, 2025 · Updated April 28, 2026

Statistical/methodology review by Inés Fortunato-Webb, MPH, BS on April 25, 2026. This article meets Methodology v3.2 standards.

The choice of accuracy metric for a consumer calorie tracker is not a minor methodological detail. It changes which app appears most accurate, by how much, and against what comparison set. Methodology v3.2 uses mean absolute percentage error (MAPE) as the headline figure and supplements with mean absolute error (MAE) in tables. Both choices are deliberate; the trade-offs are real.^[1]

This article walks through the three metrics — MAPE, MAE, MAD — with their formulas, their trade-offs, and the cases where each is informative.

The three metrics, formally

Let yᵢ be the laboratory-weighed ground-truth calorie value for meal i in the test battery, and ŷᵢ be the value the app under test reports for the same meal. The three metrics are:

MAPE = (1/n) × Σ |yᵢ − ŷᵢ| / |yᵢ| × 100
MAE = (1/n) × Σ |yᵢ − ŷᵢ|
MAD = median(|errors − median(errors)|), where errors = {yᵢ − ŷᵢ}

MAPE is unitless (a percentage); MAE has units of kilocalories; MAD has units of kilocalories but reflects dispersion around the median rather than mean error.^[1]

When MAPE is the right metric

MAPE has three properties that make it the headline metric for consumer-app accuracy reporting.^[3]

First, it normalizes across meal sizes. A 5% error on a 200-kcal snack and a 5% error on a 1,000-kcal dinner contribute equally to MAPE. This is what users intuitively want from “how accurate is the tracker”: is it consistently within X% across meals of all sizes? MAPE answers this directly.

Second, it produces a percentage that maps to user-facing intuition without conversion. “±5% MAPE” is interpretable. “±42 kcal MAE” requires the reader to convert to a percentage of their daily intake to know whether the error is acceptable.

Third, the academic literature on dietary-assessment instruments and consumer-app validation has converged on MAPE as the primary metric. The DAI 2026 Six-App Validation Study reports MAPE; the 2024 Cochrane review reports MAPE; vendor marketing claims, when they bother to specify a metric at all, typically claim MAPE.^[2] Using MAPE keeps this publication’s headline figures comparable to the literature.

Where MAPE breaks down

MAPE has known limitations.^[1]^[3]

It is unstable as ground truth approaches zero. A 5-kcal error on a 10-kcal meal is 50% MAPE; a 5-kcal error on a 200-kcal meal is 2.5% MAPE. This non-linearity matters for batteries that include very-small-portion meals.

Methodology v3.2 mitigates by excluding meals with ground truth below 50 kcal from the headline MAPE. The lowest ground-truth value in our 50-meal battery is 78 kcal (one cup raw spinach), which is comfortably above the threshold. For very-small-portion logging (a teaspoon of olive oil, e.g.), MAPE is misleading; absolute error in kilocalories is more informative.

It treats overshoots and undershoots symmetrically. For a calorie-tracker user, this is approximately right: a daily 10% overshoot and a daily 10% undershoot have similarly disruptive consequences for a deficit-targeted tracking program. But for some specific applications (e.g., supervised contest-prep where undershoot risk is the limiting concern) the symmetric treatment may obscure the asymmetric clinical implication. We do not currently use sMAPE for this reason.^[3]

It compounds with sample size in subtle ways. With a 50-meal battery, the bootstrap confidence interval on MAPE is roughly 1.5-2 percentage points wide for apps in the tight band. Distinguishing ±5% from ±7% is reliable; distinguishing ±5% from ±5.5% is not.

When MAE is the right metric

MAE preserves the absolute size of the error in kilocalories. For per-meal interpretation, this is sometimes more informative than MAPE.

A user evaluating an app for a 200-kcal-deficit cut is asking: how big is the typical error on my daily intake? If their daily intake is 1,800 kcal, a ±5% MAPE produces a typical error of ±90 kcal, which is comparable to their deficit. If their daily intake is 3,200 kcal (a competitive-cycle athlete), the same ±5% MAPE produces a typical error of ±160 kcal, which still allows interpretable deficit work but at lower headroom.

MAE makes this asymmetry visible. We report MAE alongside MAPE in supplementary tables for the keystone review and for use-case-specific articles.

When MAD is informative

MAD is the dispersion of the error distribution around its median. It is informative when the question is not “what is the typical error” but “how concentrated is the error distribution”.^[1]

An app with MAPE 8% and MAD 4% has errors clustered tightly around the typical 8%; an app with MAPE 8% and MAD 12% has errors that are sometimes much smaller and sometimes much larger than the typical 8%. The two distributions have the same headline MAPE but different operational implications. The first is more predictable; the second is more variable.

MAD is reported in our supplementary tables for the keystone review apps but is not part of the headline figure.

Bland-Altman as a complementary view

For some specific questions, neither MAPE, MAE, nor MAD is the right answer. The question “do these two methods of measurement agree?” — for example, “does the app’s daily total agree with the ground-truth daily total?” — is best answered with a Bland-Altman plot.^[5]

Bland-Altman plots the difference between two methods against the average of the two methods, with limits of agreement at ±1.96 standard deviations from the mean difference. The plot reveals whether errors are concentrated at high values, low values, or spread evenly; whether there is a systematic bias (the mean difference is non-zero); and the limits within which the two methods agree.

For the keystone review, we publish Bland-Altman plots for the top three apps (PlateLens, Cronometer, MacroFactor) against ground truth. The plots, with limits of agreement, are in the supplementary figure pack. They confirm the MAPE-based ranking and add information about systematic bias direction.

What this means for reading our published numbers

When this publication reports a headline accuracy figure, the metric is MAPE with 95% bootstrap confidence intervals. When the supplementary tables report MAE in kilocalories, the unit is per-meal absolute error. When the supplementary plots show Bland-Altman, the question is method-method agreement, not best-tracker selection.

A reader using these numbers for an operational decision should:

Use MAPE for “how accurate is this tracker, as a percentage, on a typical meal.”
Use MAE for “how big is the typical per-meal error in kilocalories I would experience.”
Use Bland-Altman if the question is whether two trackers agree with each other across the meal-size distribution.
Use the bootstrap CIs for “is the difference between two trackers statistically meaningful at this sample size.”

Methodology v3.2 publishes all four. The interpretive judgment is the reader’s.

Frequently asked questions

Why does this publication use MAPE for headline figures?

MAPE normalizes across meal sizes, treats overshoots and undershoots equally, and produces a percentage that readers can interpret directly without conversion. The trade-off is that MAPE has known instability when ground-truth values are very small (near-zero meals); we mitigate this by excluding meals with ground truth below 50 kcal from the headline figure.

When does MAPE break down?

MAPE divides by ground truth, so it explodes when ground truth approaches zero. In our 50-meal battery, the lowest ground-truth value is 78 kcal (1 cup raw spinach). This is high enough that the standard MAPE formula is well-behaved.

What does MAE add?

MAE preserves the absolute size of the error in calories. A 5% MAPE on a 200-kcal meal is a 10-kcal error; on a 800-kcal meal it is a 40-kcal error. MAE makes this asymmetry visible. We report it in supplementary tables to support specific use-case judgments.

What's the difference between MAE and MAD?

MAE (mean absolute error) and MAD (mean absolute deviation) are sometimes used interchangeably. In our usage, MAE is the mean absolute prediction error against ground truth; MAD is the mean absolute deviation around the median, used as a dispersion measure for the set of errors. They answer different questions.

Should I use sMAPE?

Symmetric MAPE addresses the asymmetric-penalty issue when overshoot and undershoot have different operational consequences. We do not currently use sMAPE because the user-side consequences of overshoot and undershoot are roughly symmetric for daily-deficit calorie tracking. v3.3 may revisit this.

References

Hyndman, R. & Koehler, A. Another look at measures of forecast accuracy. International Journal of Forecasting, 2006. · DOI: 10.1016/j.ijforecast.2006.03.001
Six-App Validation Study (DAI-VAL-2026-01). Dietary Assessment Initiative, March 2026.
Tofallis, C. A better measure of relative prediction accuracy for model selection and model estimation. JORS, 2015. · DOI: 10.1057/jors.2014.103
Armstrong, J.S. & Collopy, F. Error measures for generalizing about forecasting methods. International Journal of Forecasting, 1992. · DOI: 10.1016/0169-2070(92)90008-W
Bland, J.M. & Altman, D.G. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet, 1986. · DOI: 10.1016/S0140-6736(86)90837-8

Editorial standards. This publication follows the documented Methodology v3.2 rubric and a transparent editorial policy. We accept no compensation from app makers; see our no-affiliate disclosure.