Evaluating the impact of different Feature as a Counter data aggregation approaches on the performance of NIDSs and their selected features

Magán Carrión, Roberto; Urda, Daniel; Diaz Cano, Ignacio; Dorronsoro, Bernabé

doi:10.1093/jigpal/jzae007

jzae007.pdf (695.8Kb)

Identificadores

URI: https://hdl.handle.net/10481/91795

DOI: 10.1093/jigpal/jzae007

Exportar

Editorial

Oxford University Press

Materia

Machine learning

Feature engineering

Feature selection

Fecha

2024-03-16

Referencia bibliográfica

Roberto Magán-Carrión, Daniel Urda, Ignacio Diaz-Cano, Bernabé Dorronsoro, Evaluating the impact of different Feature as a Counter data aggregation approaches on the performance of NIDSs and their selected features, Logic Journal of the IGPL, Volume 32, Issue 2, April 2024, Pages 263–280, https://doi.org/10.1093/jigpal/jzae007

Patrocinador

Spanish Ministerio de Ciencia, Innovación y Universidades and the ERDF (RTI2018-100754-B-I00, RTI2018-098160-B-I00 and PID2020-114495RB-I00); ERDF under project FEDER-UCA18-108393 (OPTIMALE); Junta de Andalucía and ERDF (GENIUS–P18-2399); ‘Ayuda de recualificación’ funding by Ministerio de Universidades and the European Union-NextGenerationEU; Project NetSEA-GPT (C-ING-300-UGR23) funded by Consejería de Universidad, Investigación e Innovación and the European Union through the ERDF Andalusia Program 2021-2027

Resumen

There is much effort nowadays to protect communication networks against different cybersecurity attacks (which are more and more sophisticated) that look for systems’ vulnerabilities they could exploit for malicious purposes. Network Intrusion Detection Systems (NIDSs) are popular tools to detect and classify such attacks, most of them based on ML models. However, ML-based NIDSs cannot be trained by feeding them with network traffic data as it is. Thus, a Feature Engineering (FE) process plays a crucial role transforming network traffic raw data onto derived one suitable for ML models. In this work, we study the effects of applying one such FE technique in different ways on the performance of two ML models (linear and non-linear) and their selected features. This the Feature as a Counter approach. The derived observations are computed from either with the same number of raw samples, (batch-based approaches) or by aggregating them by time intervals (timestamp-based approach). Results show that there is no significant differences between the proposed approaches neither in the performance of the models nor in the selected features that validate our proposal making it feasible to be widely used as a standard FE method.

Colecciones

OpenAIRE (Open Access Infrastructure for Research in Europe)

Excepto si se señala otra cosa, la licencia del ítem se describe como Atribución 4.0 Internacional