Explainable Machine Learning Models Using Robust Cancer Biomarkers Identification from Paired Differential Gene Expression
Metadatos
Mostrar el registro completo del ítemFecha
2024-11-19Patrocinador
This project was financially supported by grant PID20210125017OB-I00, funded by MCIN/ AEI/10.13039/501100011033 and by “ERDF: A way of making Europe”. Elisa Diaz de la Guardia- Bolivar was funded by a doctoral fellowship, PRE2019-089807, from the Spanish Ministry of Science and Innovation (MCIN/AEI/10.13039/501100011033) and the European Social Fund (ESF), “ESF investing in your futureResumen
In oncology, there is a critical need for robust biomarkers that can be easily translated
into the clinic. We introduce a novel approach using paired differential gene expression analysis for
biological feature selection in machine learning models, enhancing robustness and interpretability
while accounting for patient variability. This method compares primary tumor tissue with the same
patient’s healthy tissue, improving gene selection by eliminating individual-specific artifacts. A
focus on carcinoma was selected due to its prevalence and the availability of the data; we aim to
identify biomarkers involved in general carcinoma progression, including less-researched types.
Our findings identified 27 pivotal genes that can distinguish between healthy and carcinoma tissue,
even in unseen carcinoma types. Additionally, the panel could precisely identify the tissue-of-
origin in the eight carcinoma types used in the discovery phase. Notably, in a proof of concept, the
model accurately identified the primary tissue origin in metastatic samples despite limited sample
availability. Functional annotation reveals these genes’ involvement in cancer hallmarks, detecting
subtle variations across carcinoma types. We propose paired differential gene expression analysis as
a reference method for the discovering of robust biomarkers.