dc.contributor.author | Castillo Secilla, Daniel | es_ES |
dc.contributor.author | Gálvez Gómez, Juan Manuel | es_ES |
dc.contributor.author | Herrera Maldonado, Luis Javier | es_ES |
dc.contributor.author | San Román Arenas, Belén | es_ES |
dc.contributor.author | Rojas Ruiz, Fernando | es_ES |
dc.contributor.author | Rojas Ruiz, Ignacio | e |
dc.date.accessioned | 2018-02-22T10:30:21Z | |
dc.date.available | 2018-02-22T10:30:21Z | |
dc.date.issued | 2017 | |
dc.identifier.citation | Castillo Secilla, D.; et al. Integration of RNA-Seq data with heterogeneous microarray data for breast cancer profiling. BMC Bioinformatics, 18: 506 (2017). [http://hdl.handle.net/10481/49668] | es_ES |
dc.identifier.issn | 1471-2105 | |
dc.identifier.uri | http://hdl.handle.net/10481/49668 | |
dc.description.abstract | Background: Nowadays, many public repositories containing large microarray gene expression datasets are
available. However, the problem lies in the fact that microarray technology are less powerful and accurate than more
recent Next Generation Sequencing technologies, such as RNA-Seq. In any case, information from microarrays is
truthful and robust, thus it can be exploited through the integration of microarray data with RNA-Seq data.
Additionally, information extraction and acquisition of large number of samples in RNA-Seq still entails very high costs
in terms of time and computational resources.This paper proposes a new model to find the gene signature of breast
cancer cell lines through the integration of heterogeneous data from different breast cancer datasets, obtained from
microarray and RNA-Seq technologies. Consequently, data integration is expected to provide a more robust statistical
significance to the results obtained. Finally, a classification method is proposed in order to test the robustness of the
Differentially Expressed Genes when unseen data is presented for diagnosis.
Results: The proposed data integration allows analyzing gene expression samples coming from different
technologies. The most significant genes of the whole integrated data were obtained through the intersection of the
three gene sets, corresponding to the identified expressed genes within the microarray data itself, within the RNA-Seq
data itself, and within the integrated data from both technologies. This intersection reveals 98 possible
technology-independent biomarkers. Two different heterogeneous datasets were distinguished for the classification
tasks: a training dataset for gene expression identification and classifier validation, and a test dataset with unseen data
for testing the classifier. Both of them achieved great classification accuracies, therefore confirming the validity of the
obtained set of genes as possible biomarkers for breast cancer. Through a feature selection process, a final small
subset made up by six genes was considered for breast cancer diagnosis.
Conclusions: This work proposes a novel data integration stage in the traditional gene expression analysis pipeline
through the combination of heterogeneous data from microarrays and RNA-Seq technologies. Available samples
have been successfully classified using a subset of six genes obtained by a feature selection method. Consequently, a
new classification and diagnosis tool was built and its performance was validated using previously unseen samples. | en_EN |
dc.description.sponsorship | This work was supported by Project TIN2015-71873-R (Spanish Ministry of
Economy and Competitiveness -MINECO- and the European Regional
Development Fund -ERDF). | es_ES |
dc.language.iso | eng | es_ES |
dc.publisher | Biomed Central | en_EN |
dc.rights | Creative Commons Attribution-NonCommercial-NoDerivs 3.0 License | es_ES |
dc.rights.uri | http://creativecommons.org/licenses/by-nc-nd/3.0/ | es_ES |
dc.subject | RNA-Seq | en_EN |
dc.subject | Microarray | en_EN |
dc.subject | Breast cancer | en_EN |
dc.subject | Cancer | en_EN |
dc.subject | SVM | en_EN |
dc.subject | Random Forest | en_EN |
dc.subject | K-NN | en_EN |
dc.subject | Gene expression | en_EN |
dc.subject | Classification | en_EN |
dc.subject | Integration | en_EN |
dc.title | Integration of RNA-Seq data with heterogeneous microarray data for breast cancer profiling | en_EN |
dc.type | journal article | es_ES |
dc.rights.accessRights | open access | es_ES |
dc.identifier.doi | 10.1186/s12859-017-1925-0 | |