Afficher la notice abrégée

dc.contributor.authorCamacho Páez, José 
dc.contributor.authorWasielewska, Katarzyna
dc.contributor.authorEspinosa, Pablo
dc.contributor.authorFuentes García, Raquel María 
dc.date.accessioned2023-04-24T07:45:59Z
dc.date.available2023-04-24T07:45:59Z
dc.date.issued2023
dc.identifier.urihttps://hdl.handle.net/10481/81203
dc.description.abstractAutonomous or self-driving networks are expected to provide a solution to the myriad of extremely demanding new applications in the Future Internet. The key to handle complexity is to perform tasks like network optimization and failure recovery with minimal human supervision. For this purpose, the community relies on the development of new Machine Learning (ML) models and techniques. However, ML can only be as good as the data it is fitted with. Datasets provided to the community as benchmarks for research purposes, which have a relevant impact in research findings and directions, are often assumed to be of good quality by default. In this paper, we show that relatively minor modifications on the same benchmark dataset (UGR’16, a flow-based real-traffic dataset for anomaly detection) cause significantly more impact on model performance than the specific ML technique considered. To understand this finding, we contribute a methodology to investigate the root causes for those differences, and to assess the quality of the data labelling. Our findings illustrate the need to devote more attention into (automatic) data quality assessment and optimization techniques in the context of autonomous networks.es_ES
dc.description.sponsorshipThis work was supported by the Agencia Estatal de Investigaci´on in Spain, grant No PID2020-113462RB-I00, and the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No 893146.es_ES
dc.language.isoenges_ES
dc.publisherNOMS 2023-2023 IEEE/IFIP Network Operations and Management Symposiumes_ES
dc.rightsCreative Commons Attribution-NonCommercial-NoDerivs 3.0 Licenseen_EN
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/3.0/en_EN
dc.subjectNetflowes_ES
dc.subjectUGR’16es_ES
dc.subjectanomaly detectiones_ES
dc.subjectdata qualityes_ES
dc.titleQuality In / Quality Out: Data quality more relevant than model choice in anomaly detection with the UGR’16es_ES
dc.typeconference outputes_ES
dc.rights.accessRightsopen accesses_ES
dc.type.hasVersionSMURes_ES


Fichier(s) constituant ce document

[PDF]

Ce document figure dans la(les) collection(s) suivante(s)

Afficher la notice abrégée

Creative Commons Attribution-NonCommercial-NoDerivs 3.0 License
Excepté là où spécifié autrement, la license de ce document est décrite en tant que Creative Commons Attribution-NonCommercial-NoDerivs 3.0 License