Developing Big Data anomaly dynamic and static detection algorithms: AnomalyDSD spark package
Metadatos
Afficher la notice complèteAuteur
García Gil, Diego Jesús; López, David; Argüelles-Martino, Daniel; Carrasco Castillo, Jacinto; Aguilera Martos, Ignacio; Luengo Martín, Julián; Herrera Triguero, FranciscoEditorial
Elsevier
Materia
Big Data Anomaly detection Outlier detection Unsupervised learning
Date
2025-02Referencia bibliográfica
D. García-Gil, D. López, D. Argüelles-Martino et al. Information Sciences 690 (2025) 121587. https://doi.org/10.1016/j.ins.2024.121587
Patrocinador
National Institute of Cybersecurity (INCIBE) IAFER-Cib (C074/23); University of Granada; European Union (Next Generation)Résumé
Background: Anomaly detection is the process of identifying observations that differ greatly from
the majority of data. Unsupervised anomaly detection aims to find outliers in data that is not
labeled, therefore, the anomalous instances are unknown. The exponential data generation has led
to the era of Big Data. This scenario brings new challenges to classic anomaly detection problems
due to the massive and unsupervised accumulation of data. Traditional methods are not able to
cop up with computing and time requirements of Big Data problems.
Methods: In this paper, we propose four distributed algorithm designs for Big Data anomaly
detection problems: HBOS_BD, LODA_BD, LSCP_BD, and XGBOD_BD. They have been designed
following the MapReduce distributed methodology in order to be capable of handling Big Data
problems.
Results: These algorithms have been integrated into an Spark Package, focused on static and
dynamic Big Data anomaly detection tasks, namely AnomalyDSD. Experiments using a real-world
case of study have shown the performance and validity of the proposals for Big Data problems.
Conclusions: With this proposal, we have enabled the practitioner to efficiently and effectively
detect anomalies in Big Data datasets, where the early detection of an anomaly can lead to a
proper and timely decision.