KnowSeq R-Bioc package: The automatic smart gene expression tool for retrieving relevant biological knowledge Castillo Secilla, Daniel Gálvez Gómez, Juan Manuel Carrillo Pérez, Francisco Verona Almeida, Marta Redondo Sánchez, Daniel Herrera Maldonado, Luis Javier Rojas Ruiz, Ignacio Bioconductor Gene expression Classification Enrichment Bioinformatics This work was funded by the Spanish Ministry of Sciences, Innovation and Universities under Project RTI2018-101674-B-I00 titled "Computer Architectures and Machine Learningbased solutions for complex challenges in Bioinformatics, Biotechnology and Biomedicine", in collaboration with the Government of Andalusia under Postdoctoral Grant P12TIC2082. The funders had no role in study design, datacollection and analysis, decision to publish, or preparation of this manuscript. The results published here are in whole or part based upon data generated by the TCGA Research Network: https:// www.cancer. gov/tcga. KnowSeq R/Bioc package is designed as a powerful, scalable and modular software focused on automatizing and assembling renowned bioinformatic tools with new features and functionalities. It comprises a unified environment to perform complex gene expression analyses, covering all the needed processing steps to identify a gene signature for a specific disease to gather understandable knowledge. This process may be initiated from raw files either available at well-known platforms or provided by the users themselves, and in either case coming from different information sources and different Transcriptomic technologies. The pipeline makes use of a set of advanced algorithms, including the adaptation of a novel procedure for the selection of the most representative genes in a given multiclass problem. Similarly, an intelligent system able to classify new patients, providing the user the opportunity to choose one among a number of well-known and widespread classification and feature selection methods in Bioinformatics, is embedded. Furthermore, KnowSeq is engineered to automatically develop a complete and detailed HTML report of the whole process which is also modular and scalable. Biclass breast cancer and multiclass lung cancer study cases were addressed to rigorously assess the usability and efficiency of KnowSeq. The models built by using the Differential Expressed Genes achieved from both experiments reach high classification rates. Furthermore, biological knowledge was extracted in terms of Gene Ontologies, Pathways and related diseases with the aim of helping the expert in the decision-making process. KnowSeq is available at Bioconductor (https://bioconductor.org/packages/KnowSeq), GitHub (https://github.com/CasedUgr/KnowSeq) and Docker (https://hub.docker.com/r/casedugr/knowseq). 2021-07-09T07:56:30Z 2021-07-09T07:56:30Z 2021-04-13 info:eu-repo/semantics/article Daniel Castillo-Secilla... [et al.]. KnowSeq R-Bioc package: The automatic smart gene expression tool for retrieving relevant biological knowledge, Computers in Biology and Medicine, Volume 133, 2021, 104387, ISSN 0010-4825, [https://doi.org/10.1016/j.compbiomed.2021.104387] http://hdl.handle.net/10481/69620 10.1016/j.compbiomed.2021.104387 eng http://creativecommons.org/licenses/by-nc-nd/3.0/es/ info:eu-repo/semantics/openAccess Atribución-NoComercial-SinDerivadas 3.0 España Elsevier