Toward Robust Machine Learning Models for MALDI-TOF MS: Novel Approaches for Mycobacterium abscessus Subspecies Identification Padial-Fuillerat, Erica Martínez Manjón, Juan Emilio Zwir Nawrocki, Jorge Sergio Igor Arroyo Pulgar, Manuel Jesús Blázquez Sánchez, Mario Rodríguez Temporal, David Rodríguez, Belén Mancera, Luis Val Muñoz, María Coral Del AMR MALDI-TOF Mycobacterium This work has been supported by the grant PID2024- 158244OB-I00 financiado por MICIU/AEI/10.13039/ 501100011033/FEDER, UE, by “Ethical, Responsible and General Purpose Artificial Intelligence: Applications In Risk Scenarios” (IAFER) Exp.:TSI-100927-2023-1 funded through the Creation of university-industry research programs (Enia Programs), aimed at the research and development of artificial intelligence, for its dissemination and education within the framework of the Recovery, Transformation and Resilience Plan from the European Union Next Generation EU through the Ministry for Digital Transformation and the Civil Service. This research is funded thanks to the Aid for Industrial Doctorates, corresponding to the 2021 call of the State Program to develop, attract, and retain talent, within the framework of the Plan for Scientific, Technical and Innovation Research 2021−2023, financed by MCIN/AEI/10.13039/ 501100011033, the reference of the aid being DIN2021- 012063. Funding for open access charge: Universidad de Granada/CBUA. Distinguishing Mycobacterium abscessus subspecies presents significant diagnostic challenges due to their genetic homogeneity and variability in analytical platforms. Our research combines matrix-assisted laser desorption/ionization time-of-flight (MALDI-TOF) mass spectrometry with machine learning (ML) approaches to enhance discrimination accuracy, utilizing 325 spectra profiles from diverse European hospitals. The analytical pipeline incorporates specialized techniques for geographical data harmonization, feature selection, and balancing class representation. The best model employs support vector machines (SVMs) with ComBat correction, Boruta feature selection, and centroid clustering for class imbalance, achieving a discrimination performance of 97% F1 score and 97.17% AUC-ROC on test samples. Noteworthily, most tested models improved their discrimination performance with the approach and demonstrated consistent performance metrics with high geometric mean (GEO) and index balanced accuracy (IBA) metrics (>0.90), ensuring consistent sensitivity and specificity across all subspecies. SHAP (SHapley Additive exPlanations) validated the biological relevance of selected spectral features, particularly improving discrimination of the diagnostically challenging M. abscessus subsp. bolletii. This work advances the state-of-the-art in M. abscessus classification, providing a scalable analytical framework for enhanced microbial diagnostics and targeted antimicrobial therapy selection. 2026-02-16T09:30:59Z 2026-02-16T09:30:59Z 2026-01-23 journal article Published version: Padial-Fuillerat E, Martínez-Manjón JE, Zwir I, Arroyo MJ, Blázquez-Sánchez M, Rodríguez-Temporal D, Rodríguez B, Mancera L, Del Val C. Toward Robust Machine Learning Models for MALDI-TOF MS: Novel Approaches for Mycobacterium abscessus Subspecies Identification. J Proteome Res. 2026 Feb 9. doi: 10.1021/acs.jproteome.5c00534 1535-3907 1535-3893 https://hdl.handle.net/10481/111016 10.1021/acs.jproteome.5c00534 eng http://creativecommons.org/licenses/by-sa/4.0/ open access Atribución-CompartirIgual 4.0 Internacional American Chemical Society