Interpretable Feature Learning in Multivariate Big Data Analysis for Network Monitoring

Camacho Páez, José; Wasielewska, Katarzyna; Bro, Rasmus; Kotz, David

doi:10.1109/TNSM.2024.3368501

Accepted paper (4.921Mb)

Identificadores

URI: https://hdl.handle.net/10481/93666

DOI: 10.1109/TNSM.2024.3368501

Exportar

Editorial

IEEE

Materia

Data models

Analytical models

Monitoring

Big Data

Representation learning

Principal component analysis

Data visualization

Fecha

2024

Referencia bibliográfica

J. Camacho, K. Wasielewska, R. Bro and D. Kotz, "Interpretable Feature Learning in Multivariate Big Data Analysis for Network Monitoring," in IEEE Transactions on Network and Service Management, vol. 21, no. 3, pp. 2926-2943, June 2024, doi: 10.1109/TNSM.2024.3368501. keywords: {Data models;Analytical models;Monitoring;Big Data;Representation learning;Principal component analysis;Data visualization;Interpretable machine learning;multivariate big data analysis;anomaly detection;big data;UGR’16;dartmouth campus Wi-Fi;network monitoring},

Patrocinador

10.13039/100000001-US National Science Foundation (Grant Number: 0454062) Agencia Estatal de Investigación in Spain (Grant Number: PID2020-113462RBI00) 10.13039/100010665-European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie (Grant Number: 893146) Universidad de Granada/CBUA

Resumen

There is an increasing interest in the development of new data-driven models useful to assess the performance of communication networks. For many applications, like network monitoring and troubleshooting, a data model is of little use if it cannot be interpreted by a human operator. In this paper, we present an extension of the Multivariate Big Data Analysis (MBDA) methodology, a recently proposed interpretable data analysis tool. In this extension, we propose a solution to the automatic derivation of features, a cornerstone step for the application of MBDA when the amount of data is massive. The resulting network monitoring approach allows us to detect and diagnose disparate network anomalies, with a data-analysis workflow that combines the advantages of interpretable and interactive models with the power of parallel processing. We apply the extended MBDA to two case studies: UGR’16, a benchmark flow-based real-traffic dataset for anomaly detection, and Dartmouth’18, the longest and largest Wi-Fi trace known to date.

Colecciones

DTSTC - Artículos

Excepto si se señala otra cosa, la licencia del ítem se describe como Creative Commons Attribution-NonCommercial-NoDerivs 3.0 License