Mining high average-utility sequential rules to identify high-utility gene expression sequences in longitudinal human studies Delgado-Segura, Alberto Anguita Ruiz, Augusto Alcalá Fernández, Rafael Alcalá Fernández, Jesús High average-utility sequential rules Multi-objective evolutionary algorithm eXplainable artificial intelligence Gene expression patterns Obesity This work was supported by the ERDF/Regional Government of Andalusia/Ministry of Economic Transformation, Industry, Knowledge and Universities (grant number P18-RT-2248) and the ERDF/Health Institute Carlos III/Spanish Ministry of Science, Innovation and Universities (grant number PI20/00711). High-utility sequential pattern mining techniques have demonstrated good performance in identifying associations between mRNA levels in microarray experiments taking into account both the biological context of each gene and the temporal characteristics of the dataset. However, these patterns do not provide information about how likely it is that the events in the pattern occur in the order indicated, therefore causal relationships cannot be established between of them. This reduces their predictive ability, making difficult its direct applicability to the field of gene expression dynamic modeling. An alternative to sequential patterns which takes the confidence of the forecast into account is the discovery of sequential rules. Their natural and seamless relation to human behavior makes them very suitable to understand complex models without missing the possibility of using the generated rules as a standalone prediction model. This contribution proposes an evolutionary algorithm optimizing multiple objectives for mining biologically relevant high average-utility sequential rules from longitudinal human gene expression data with a good compromise through average-utility and explainability. This proposal enhances the well-known NSGA-II to learn, by evolutionary optimization, the rules maximizing two objectives: Utility and Interestingness. Moreover, a restarting mechanism and an external population have been particularly designed and included in order to encourage diversity in the search process preserving all the rules found. The quality of our approaches has been analyzed using external biological resources, statistical analysis and comparing with other proposals from the literature. 2024-10-28T08:25:29Z 2024-10-28T08:25:29Z 2022-05-01 journal article Segura-Delgado, A., Anguita-Ruiz, A., Alcalá, R., Alcalá-Fdez, J. Mining high average-utility sequential rules to identify high-utility gene expression sequences in longitudinal human studies, Expert Systems with Applications, Volume 193, 2022, 116411, ISSN 0957-4174, https://doi.org/10.1016/j.eswa.2021.116411. https://hdl.handle.net/10481/96378 10.1016/j.eswa.2021.116411 eng http://creativecommons.org/licenses/by-nc-nd/4.0/ open access Attribution-NonCommercial-NoDerivatives 4.0 Internacional Elsevier