Assessing the complementary information from an increased number of biologically relevant features in liquid biopsy-derived RNA-Seq data
Metadatos
Mostrar el registro completo del ítemAutor
Giannoukakos, Stavros Panagiotis; D’Ambrosi, Silvia; Koppers-Lalic, Danijela; Gómez Martín, Cristina; Fernández Hilario, Alberto Luis; Hackenberg, MichaelEditorial
Elsevier
Materia
Liquid biopsy Bioinformatics Machine learning
Fecha
2024-03-12Referencia bibliográfica
Giannoukakos, Stavros, et al. Assessing the complementary information from an increased number of biologically relevant features in liquid biopsy-derived RNA-Seq data. Heliyon 10 (2024) e27360 [10.1016/j.heliyon.2024.e27360]
Patrocinador
European Union’s Horizon 2020 research and innovation program under the Marie Skłodowska-Curie grant agreement ELBA No 765492Resumen
Liquid biopsy-derived RNA sequencing (lbRNA-seq) exhibits significant promise for clinicoriented
cancer diagnostics due to its non-invasiveness and ease of repeatability. Despite substantial
advancements, obstacles like technical artefacts and process standardisation impede
seamless clinical integration. Alongside addressing technical aspects such as normalising fluctuating
low-input material and establishing a standardised clinical workflow, the lack of result
validation using independent datasets remains a critical factor contributing to the often low
reproducibility of liquid biopsy-detected biomarkers.
Considering the outlined drawbacks, our objective was to establish a workflow/methodology
characterised by: 1. Harness the rich diversity of biological features accessible through lbRNA-seq
data, encompassing a holistic range of molecular and functional attributes. These components are
seamlessly integrated via a Machine Learning-based Ensemble Classification framework, enabling
a unified and comprehensive analysis of the intricate information encoded within the data. 2.
Implementing and rigorously benchmarking intra-sample normalisation methods to heighten
their relevance within clinical settings. 3. Thoroughly assessing its efficacy across independent
test sets to ascertain its robustness and potential utility.
Using ten datasets from several studies comprising three different sources of biological material,
we first show that while the best-performing normalisation methods depend strongly on the
dataset and coupled Machine Learning method, the rather simple Counts Per Million method is
generally very robust, showing comparable performance to cross-sample methods. Subsequently, we demonstrate that the innovative biofeature types introduced in this study, such as the Fraction
of Canonical Transcript, harbour complementary information. Consequently, their inclusion
consistently enhances prediction power compared to models relying solely on gene expressionbased
biofeatures. Finally, we demonstrate that the workflow is robust on completely independent
datasets, generally from different labs and/or different protocols. Taken together, the
workflow presented here outperforms generally employed methods in prediction accuracy and
may hold potential for clinical diagnostics application due to its specific design.