Leukemia multiclass assessment and classification from Microarray and RNA-seq technologies integration at gene expression level
Metadatos
Mostrar el registro completo del ítemAutor
Castillo Secilla, Daniel; Gálvez Gómez, Juan Manuel; Herrera Maldonado, Luis Javier; Rojas Ruiz, Fernando José; Valenzuela Cansino, Olga; Caba Pérez, Octavio; Prados Salazar, José Carlos; Rojas Ruiz, IgnacioEditorial
Public Library of Science (PLOS)
Fecha
2019-02-12Referencia bibliográfica
Castillo D, Galvez JM, Herrera LJ, Rojas F, Valenzuela O, Caba O, et al. (2019) Leukemia multiclass assessment and classification from Microarray and RNA-seq technologies integration at gene expression level. PLoS ONE 14(2): e0212127. [https://doi.org/10.1371/journal. pone.0212127]
Patrocinador
This work was supported by Project TIN2015-71873-R (Spanish Ministry of Economy and Competitiveness -MINECO- and the European Regional Development Fund -ERDF) and Junta de Andalucı´a (P12–TIC–2082).Resumen
In more recent years, a significant increase in the number of available biological experiments
has taken place due to the widespread use of massive sequencing data. Furthermore,
the continuous developments in the machine learning and in the high performance
computing areas, are allowing a faster and more efficient analysis and processing of this
type of data. However, biological information about a certain disease is normally widespread
due to the use of different sequencing technologies and different manufacturers, in different
experiments along the years around the world. Thus, nowadays it is of paramount importance
to attain a correct integration of biologically-related data in order to achieve genuine
benefits from them. For this purpose, this work presents an integration of multiple Microarray
and RNA-seq platforms, which has led to the design of a multiclass study by collecting samples
from the main four types of leukemia, quantified at gene expression. Subsequently, in
order to find a set of differentially expressed genes with the highest discernment capability
among different types of leukemia, an innovative parameter referred to as coverage is presented
here. This parameter allows assessing the number of different pathologies that a
certain gen is able to discern. It has been evaluated together with other widely known
parameters under assessment of an ANOVA statistical test which corroborated its filtering
power when the identified genes are subjected to a machine learning process at multiclass
level. The optimal tuning of gene extraction evaluated parameters by means of this statistical
test led to the selection of 42 highly relevant expressed genes. By the use of minimum-
Redundancy Maximum-Relevance (mRMR) feature selection algorithm, these genes were
reordered and assessed under the operation of four different classification techniques. Outstanding
results were achieved by taking exclusively the first ten genes of the ranking into
consideration. Finally, specific literature was consulted on this last subset of genes, revealing
the occurrence of practically all of them with biological processes related to leukemia. At sight of these results, this study underlines the relevance of considering a new parameter
which facilitates the identification of highly valid expressed genes for simultaneously discerning
multiple types of leukemia.