Mining high average-utility sequential rules to identify high-utility gene expression sequences in longitudinal human studies
Metadata
Show full item recordAuthor
Delgado-Segura, Alberto; Anguita Ruiz, Augusto; Alcalá Fernández, Rafael; Alcalá Fernández, JesúsEditorial
Elsevier
Materia
High average-utility sequential rules Multi-objective evolutionary algorithm eXplainable artificial intelligence Gene expression patterns Obesity
Date
2022-05-01Referencia bibliográfica
Segura-Delgado, A., Anguita-Ruiz, A., Alcalá, R., Alcalá-Fdez, J.
Mining high average-utility sequential rules to identify high-utility gene expression sequences in longitudinal human studies,
Expert Systems with Applications,
Volume 193,
2022,
116411,
ISSN 0957-4174,
https://doi.org/10.1016/j.eswa.2021.116411.
Sponsorship
ERDF; Regional Government of Andalusia; Ministry of Economic Transformation, Industry, Knowledge and Universities P18-RT-2248; Health Institute Carlos III; Spanish Ministry of Science, Innovation and Universities PI20/00711Abstract
High-utility sequential pattern mining techniques have demonstrated good performance in identifying associations between mRNA levels in microarray experiments taking into account both the biological context of each gene and the temporal characteristics of the dataset. However, these patterns do not provide information about how likely it is that the events in the pattern occur in the order indicated, therefore causal relationships cannot be established between of them. This reduces their predictive ability, making difficult its direct applicability to the field of gene expression dynamic modeling. An alternative to sequential patterns which takes the confidence of the forecast into account is the discovery of sequential rules. Their natural and seamless relation to human behavior makes them very suitable to understand complex models without missing the possibility of using the generated rules as a standalone prediction model. This contribution proposes an evolutionary algorithm optimizing multiple objectives for mining biologically relevant high average-utility sequential rules from longitudinal human gene expression data with a good compromise through average-utility and explainability. This proposal enhances the well-known NSGA-II to learn, by evolutionary optimization, the rules maximizing two objectives: Utility and Interestingness. Moreover, a restarting mechanism and an external population have been particularly designed and included in order to encourage diversity in the search process preserving all the rules found. The quality of our approaches has been analyzed using external biological resources, statistical analysis and comparing with other proposals from the literature.