Assessing Bias in the Evaluation of Blood Glucose Prediction Models
Metadata
Show full item recordAuthor
Rodríguez León, Ciro; Avilés-Pérez, María Dolores; Baños Legrán, Oresti; Lopez-Ibarra Lozano, Pablo J.; Muñoz Torres, Manuel Eduardo; Quesada Charneco, Miguel; Villalonga Palliser, ClaudiaEditorial
Springer
Materia
Deep Learning Type 1 diabetes Glucose Prediction
Date
2025-10-01Referencia bibliográfica
Rodriguez-Leon, C. et al. (2026). Assessing Bias in the Evaluation of Blood Glucose Prediction Models. In: Rojas, I., Joya, G., Catala, A. (eds) Advances in Computational Intelligence. IWANN 2025. Lecture Notes in Computer Science, vol 16008, pp 653–663. Springer, Cham. https://doi.org/10.1007/978-3-032-02725-2_51
Sponsorship
PID2023-148188OA-I00 project ”RELIEF-T1D” which is funded by MICIU/AEI/10.13039/501100011033 and ERDF EU.Abstract
Diabetes mellitus (DM) poses a critical global health challenge, with type 1 diabetes (T1D) patients presenting unique difficulties in maintaining a safe blood glucose level (BGL). This work demonstrates that evaluating BGL prediction models without considering different BGL ranges, hypoglycemia, hyperglycemia, and normoglycemia, introduces bias in assessing the prediction results. Data are obtained from the T1DiabetesGranada dataset, comprising over 22.5 million measured BGL values recorded at 15-min intervals, and are preprocessed into a uniform format for supervised learning. Time series are segmented into windows with a 2-h history length and prediction horizons of 30 and 60 min. An LSTM architecture is used to predict BGL values due to its ability to capture temporal dependencies. The evaluation combines traditional non-clinical metrics (RMSE, MAE, MAPE) with clinical metrics derived from the Clarke Error Grid. The newly proposed evaluation strategy assesses BGL prediction models performance not only across the entire BGL range but also within different BGL ranges. Results indicate that evaluation metrics computed using the entire BGL range may suggest satisfactory BGL prediction model performance. However, significant deficiencies emerge in hypoglycemic ranges, implying that conventional evaluation strategies may overestimate BGL prediction models capabilities. These findings highlight the need for a comprehensive evaluation strategy in different BGL ranges to avoid bias, especially while evaluating clinically critical regions.




