Mining high average-utility sequential rules to identify high-utility gene expression sequences in longitudinal human studies

Delgado-Segura, Alberto; Anguita Ruiz, Augusto; Alcalá Fernández, Rafael; Alcalá Fernández, Jesús

doi:10.1016/j.eswa.2021.116411

Artículo principal (441.1Kb)

Identificadores

URI: https://hdl.handle.net/10481/96378

DOI: 10.1016/j.eswa.2021.116411

Exportar

Editorial

Elsevier

Materia

High average-utility sequential rules

Multi-objective evolutionary algorithm

eXplainable artificial intelligence

Gene expression patterns

Obesity

Date

2022-05-01

Referencia bibliográfica

Segura-Delgado, A., Anguita-Ruiz, A., Alcalá, R., Alcalá-Fdez, J. Mining high average-utility sequential rules to identify high-utility gene expression sequences in longitudinal human studies, Expert Systems with Applications, Volume 193, 2022, 116411, ISSN 0957-4174, https://doi.org/10.1016/j.eswa.2021.116411.

Sponsorship

ERDF; Regional Government of Andalusia; Ministry of Economic Transformation, Industry, Knowledge and Universities P18-RT-2248; Health Institute Carlos III; Spanish Ministry of Science, Innovation and Universities PI20/00711

Abstract

High-utility sequential pattern mining techniques have demonstrated good performance in identifying associations between mRNA levels in microarray experiments taking into account both the biological context of each gene and the temporal characteristics of the dataset. However, these patterns do not provide information about how likely it is that the events in the pattern occur in the order indicated, therefore causal relationships cannot be established between of them. This reduces their predictive ability, making difficult its direct applicability to the field of gene expression dynamic modeling. An alternative to sequential patterns which takes the confidence of the forecast into account is the discovery of sequential rules. Their natural and seamless relation to human behavior makes them very suitable to understand complex models without missing the possibility of using the generated rules as a standalone prediction model. This contribution proposes an evolutionary algorithm optimizing multiple objectives for mining biologically relevant high average-utility sequential rules from longitudinal human gene expression data with a good compromise through average-utility and explainability. This proposal enhances the well-known NSGA-II to learn, by evolutionary optimization, the rules maximizing two objectives: Utility and Interestingness. Moreover, a restarting mechanism and an external population have been particularly designed and included in order to encourage diversity in the search process preserving all the rules found. The quality of our approaches has been analyzed using external biological resources, statistical analysis and comparing with other proposals from the literature.

Collections

DCCIA - Artículos

Except where otherwise noted, this item's license is described as Attribution-NonCommercial-NoDerivatives 4.0 Internacional