Multi-step histogram based outlier scores for unsupervised anomaly detection: ArcelorMittal engineering dataset case of study
Identificadores
URI: https://hdl.handle.net/10481/82022Metadatos
Afficher la notice complèteAuteur
Aguilera Martos, Ignacio; García-Barzana, Marta; García Gil, Diego Jesús; Carrasco Castillo, Jacinto; López Pretel, David; Luengo Martín, Julián; Herrera Triguero, FranciscoEditorial
Elsevier
Materia
Histograms Anomaly detection Unsupervised learning Time series
Date
2023-08-01Referencia bibliográfica
Aguilera-Martos, I., García-Barzana, M., García-Gil, D., Carrasco, J., López, D., Luengo, J., & Herrera, F. (2023). Multi-step Histogram Based Outlier Scores for Unsupervised Anomaly Detection: ArcelorMittal Engineering Dataset Case of Study. Neurocomputing, 126228.
Patrocinador
Ministry of Science and Technology under project PID2020-119478 GB-I00; Contract UGR-AM OTRI-426; Andalusian Excellence project P18-FR-496; Spanish Ministry of Science under the FPU Programme 998758-2016Résumé
Anomaly detection is the task of detecting samples that behave differently from the rest of the data or that include abnormal values. Unsupervised anomaly detection is the most common scenario, which implies that the algorithms cannot train with a labeled input and do not know the anomaly behavior beforehand. Histogram-based methods are one of the most approaches in unsupervised anomaly detection, remarking a good performance and a low runtime. Despite the good performance, histogram-based anomaly detectors are not capable of processing data flows while updating their knowledge and cannot deal with a high amount of samples.
In this paper, we propose a new histogram-based approach for addressing the aforementioned problems by introducing the ability to update the information inside a histogram. We have applied these strategies to design a new algorithm called Multi-step Histogram Based Outlier Scores (MHBOS), including five new histogram update mechanisms. The results have shown the performance and validity of MHBOS as well as the proposed strategies in terms of performance and computing times.