Multiclass classification for skin cancer profiling based on the integration of heterogeneous gene expression series
Metadatos
Afficher la notice complèteAuteur
Gálvez, Juan Manuel; Castillo, Daniel; Herrera Maldonado, Luis Javier; San Román, Belén; Valenzuela Cansino, Olga; Ortuño, Francisco Manuel; Rojas Ruiz, IgnacioEditorial
Public Library of Science
Date
2018-05-11Referencia bibliográfica
Gálvez JM, Castillo D, Herrera LJ, San Román B, Valenzuela O, Ortuño FM, et al. (2018) Multiclass classification for skin cancer profiling based on the integration of heterogeneous gene expression series. PLoS ONE 13(5): e0196836. https://doi.org/10.1371/journal.pone.0196836.
Patrocinador
This work has been partially supported by the Government of Andalusia and its development is part of the research project Advanced Computer Systems in Applications in the field of Biotechnology and Bioinformatics. (reference P12±TIC±2082), in collaboration with the research project ªProgress in Computer Architectures for Automatic Learning using Heterogeneous Sources: Health and Well-Being Applicationsº (reference TIN2015±71873±R). There was no additional external funding received for this study.Résumé
Most of the research studies developed applying microarray technology to the characterization
of different pathological states of any disease may fail in reaching statistically significant
results. This is largely due to the small repertoire of analysed samples, and to the limitation
in the number of states or pathologies usually addressed. Moreover, the influence of potential
deviations on the gene expression quantification is usually disregarded. In spite of the
continuous changes in omic sciences, reflected for instance in the emergence of new Next-
Generation Sequencing-related technologies, the existing availability of a vast amount of
gene expression microarray datasets should be properly exploited. Therefore, this work proposes
a novel methodological approach involving the integration of several heterogeneous
skin cancer series, and a later multiclass classifier design. This approach is thus a way to
provide the clinicians with an intelligent diagnosis support tool based on the use of a robust
set of selected biomarkers, which simultaneously distinguishes among different cancerrelated
skin states. To achieve this, a multi-platform combination of microarray datasets
from Affymetrix and Illumina manufacturers was carried out. This integration is expected to
strengthen the statistical robustness of the study as well as the finding of highly-reliable skin
cancer biomarkers. Specifically, the designed operation pipeline has allowed the identification
of a small subset of 17 differentially expressed genes (DEGs) from which to distinguish
among 7 involved skin states. These genes were obtained from the assessment of a number
of potential batch effects on the gene expression data. The biological interpretation of these
genes was inspected in the specific literature to understand their underlying information in
relation to skin cancer. Finally, in order to assess their possible effectiveness in cancer diagnosis,
a cross-validation Support Vector Machines (SVM)-based classification including feature
ranking was performed. The accuracy attained exceeded the 92% in overall recognition
of the 7 different cancer-related skin states. The proposed integration scheme is expected
to allow the co-integration with other state-of-the-art technologies such as RNA-seq.