Developing Big Data anomaly dynamic and static detection algorithms: AnomalyDSD spark package

García Gil, Diego Jesús; López, David; Argüelles-Martino, Daniel; Carrasco Castillo, Jacinto; Aguilera Martos, Ignacio; Luengo Martín, Julián; Herrera Triguero, Francisco

doi:10.1016/j.ins.2024.121587

1-s2.0-S0020025524015019-main.pdf (1.098Mo)

Identificadores

URI: https://hdl.handle.net/10481/102154

DOI: 10.1016/j.ins.2024.121587

Exportar

Editorial

Elsevier

Materia

Big Data

Anomaly detection

Outlier detection

Unsupervised learning

Date

2025-02

Referencia bibliográfica

D. García-Gil, D. López, D. Argüelles-Martino et al. Information Sciences 690 (2025) 121587. https://doi.org/10.1016/j.ins.2024.121587

Patrocinador

National Institute of Cybersecurity (INCIBE) IAFER-Cib (C074/23); University of Granada; European Union (Next Generation)

Résumé

Background: Anomaly detection is the process of identifying observations that differ greatly from the majority of data. Unsupervised anomaly detection aims to find outliers in data that is not labeled, therefore, the anomalous instances are unknown. The exponential data generation has led to the era of Big Data. This scenario brings new challenges to classic anomaly detection problems due to the massive and unsupervised accumulation of data. Traditional methods are not able to cop up with computing and time requirements of Big Data problems. Methods: In this paper, we propose four distributed algorithm designs for Big Data anomaly detection problems: HBOS_BD, LODA_BD, LSCP_BD, and XGBOD_BD. They have been designed following the MapReduce distributed methodology in order to be capable of handling Big Data problems. Results: These algorithms have been integrated into an Spark Package, focused on static and dynamic Big Data anomaly detection tasks, namely AnomalyDSD. Experiments using a real-world case of study have shown the performance and validity of the proposals for Big Data problems. Conclusions: With this proposal, we have enabled the practitioner to efficiently and effectively detect anomalies in Big Data datasets, where the early detection of an anomaly can lead to a proper and timely decision.

Colecciones

OpenAIRE (Open Access Infrastructure for Research in Europe)

Excepté là où spécifié autrement, la license de ce document est décrite en tant que Attribution-NonCommercial-NoDerivatives 4.0 Internacional