Assessing Bias in the Evaluation of Blood Glucose Prediction Models

Rodríguez León, Ciro; Avilés-Pérez, María Dolores; Baños Legrán, Oresti; Lopez-Ibarra Lozano, Pablo J.; Muñoz Torres, Manuel Eduardo; Quesada Charneco, Miguel; Villalonga Palliser, Claudia

doi:10.1007/978-3-032-02725-2_51

2025_IWANN2025_Bias-postprint.pdf (1.705Mb)

Identificadores

URI: https://hdl.handle.net/10481/112114

DOI: 10.1007/978-3-032-02725-2_51

ISBN: 978-3-032-02725-2

Exportar

Editorial

Springer

Materia

Deep Learning

Type 1 diabetes

Glucose Prediction

Date

2025-10-01

Referencia bibliográfica

Rodriguez-Leon, C. et al. (2026). Assessing Bias in the Evaluation of Blood Glucose Prediction Models. In: Rojas, I., Joya, G., Catala, A. (eds) Advances in Computational Intelligence. IWANN 2025. Lecture Notes in Computer Science, vol 16008, pp 653–663. Springer, Cham. https://doi.org/10.1007/978-3-032-02725-2_51

Sponsorship

PID2023-148188OA-I00 project ”RELIEF-T1D” which is funded by MICIU/AEI/10.13039/501100011033 and ERDF EU.

Abstract

Diabetes mellitus (DM) poses a critical global health challenge, with type 1 diabetes (T1D) patients presenting unique difficulties in maintaining a safe blood glucose level (BGL). This work demonstrates that evaluating BGL prediction models without considering different BGL ranges, hypoglycemia, hyperglycemia, and normoglycemia, introduces bias in assessing the prediction results. Data are obtained from the T1DiabetesGranada dataset, comprising over 22.5 million measured BGL values recorded at 15-min intervals, and are preprocessed into a uniform format for supervised learning. Time series are segmented into windows with a 2-h history length and prediction horizons of 30 and 60 min. An LSTM architecture is used to predict BGL values due to its ability to capture temporal dependencies. The evaluation combines traditional non-clinical metrics (RMSE, MAE, MAPE) with clinical metrics derived from the Clarke Error Grid. The newly proposed evaluation strategy assesses BGL prediction models performance not only across the entire BGL range but also within different BGL ranges. Results indicate that evaluation metrics computed using the entire BGL range may suggest satisfactory BGL prediction model performance. However, significant deficiencies emerge in hypoglycemic ranges, implying that conventional evaluation strategies may overestimate BGL prediction models capabilities. These findings highlight the need for a comprehensive evaluation strategy in different BGL ranges to avoid bias, especially while evaluating clinically critical regions.

Collections

DICAR - Comunicaciones Congresos, Conferencias, ...