Computational and statistical methods for integrated analysis of biomedical data

Martorell Marugán, Jordi

80809(1).pdf (7.315Mb)

Identificadores

URI: http://hdl.handle.net/10481/68192

ISBN: 9788413068534

Exportar

Editorial

Universidad de Granada

Director

Carmona Sáez, Pedro; González Rumayor, Víctor

Departamento

Universidad de Granada. Programa de Doctorado en Biomedicina

Materia

Biomedical data

Computational methods

Fecha

2021

Fecha lectura

2021-04-27

Referencia bibliográfica

Martorell Marugán, Jordi. Computational and statistical methods for integrated analysis of biomedical data. Granada: Universidad de Granada, 2021. [http://hdl.handle.net/10481/68192]

Patrocinador

Tesis Univ. Granada.; Ayudas para contratos para la formación de investigadores en empresas (Doctorados Industriales) 2016. Ministerio de Economía, Industria y Competitividad.; Short-Term Fellowship. European Molecular Biology Organization (EMBO).; Identificación de biomarcadores en lupus eritematoso sistémico mediante análisis integrado de transcriptoma y metiloma. Consejería de Salud de la Junta de Andalucía. Reference PI-0173-2017.; Molecular reclassification to find clinically useful biomarkers for systemic autoimmune diseases (PRECISESADS). EU-Innovative Medicines Initiative (IMI). Reference 115565.

Resumen

During recent years, the new omics technologies have revolutionized the biomedical research paradigm, changing from studying few specific elements based on previous hypotheses to studying complete systems like the genome or the transcriptome, generating hypotheses from the data. This change has created the necessity of a new profile in the biomedical research, the bioinformatician or computational biologist, who combines knowledge about biology, informatics and statistics in order to analyse these huge amounts of data and to develop new analytical methods. In this context of massive data generation, different public repositories were created where researchers can submit the data generated in their studies with the aim of guaranteeing the reproducibility of their results and of doing the data usable in other retrospective studies. For the last years, the amount of stored data in public repositories has grown exponentially thanks to the lowering costs of the necessary technologies to generate them. One of the most used repositories is the Gene Expression Omnibus (GEO), maintained by the NCBI. GEO contain the data generated in all types of omics projects, including gene expression, methylation or DNA sequencing, among others. The availability of all these amounts of information offers an invaluable resource to generate and test hypotheses through the use and integration of these data. However, for that aim, proper statistical and computational methods for integrating information are necessary. Among the strategies to reanalyse public data is the meta-analysis, consisting on the combination of the results from different studies using proper statistical techniques with the aim of increasing the statistical power and resolving discrepancies between studies, among other applications. The main objective of this doctoral thesis has been the development of computational methods for the integration of heterogeneous data sets with the aim of analysing them in conjunction using meta-analysis and integrated analysis methods.

Durante los últimos años, las nuevas tecnologías ómicas han revolucionado el paradigma de la investigación biomédica, pasando de estudiar unos pocos elementos concretos basándose en hipótesis previas a estudiar sistemas completos como el genoma o el transcriptoma, generando hipótesis a partir de los datos. Este cambio ha creado la necesidad de un nuevo perfil en la investigación biomédica, el del bioinformático o biólogo computacional, que combina conocimientos de biología, informática y estadística para analizar estas grandes cantidades de datos y desarrollar nuevos métodos analíticos. En este contexto de generación de datos masivos, se crearon distintos repositorios públicos en los que los investigadores pueden subir los datos que generan en sus estudios con el fin de garantizar la reproducibilidad de sus resultados y de que puedan ser usados en otros estudios retrospectivos. Durante los últimos años, la cantidad de datos almacenados en repositorios públicos ha crecido exponencialmente gracias al abaratamiento de las tecnologías necesarias para generarlos. Uno de los repositorios más usados es el Gene Expression Omnibus (GEO), mantenido por el NCBI. GEO contiene los datos generados en todo tipo de proyectos ómicos, incluyendo datos de expresión, metilación o secuenciación de ADN, entre otros. La disponibilidad de toda esta gran cantidad de información ofrece un recurso inestimable para generar y contrastar hipótesis mediante el uso o integración de estos datos. No obstante, para ello se requieren metodologías estadísticas y computaciones apropiadas que puedan ser aplicadas para la integración de información. Entre las estrategias para reanalizar datos públicos se encuentra el meta-análisis, que es la combinación de los resultados de distintos estudios mediante técnicas estadísticas apropiadas con el fin de aumentar el poder estadístico y de resolver discrepancias entre estudios, entre otras aplicaciones. El objetivo principal de esta tesis doctoral ha sido el desarrollo de métodos computacionales para la integración de conjuntos de datos heterogéneos y de distinto origen, con el objeto de analizarlos conjuntamente mediante metodologías de metaanálisis y análisis integrado de datos.

Colecciones

Tesis

Excepto si se señala otra cosa, la licencia del ítem se describe como Atribución-NoComercial-SinDerivadas 3.0 España