Evaluating the impact of different Feature as a Counter data aggregation approaches on the performance of NIDSs and their selected features
Metadata
Show full item recordEditorial
Oxford University Press
Materia
Machine learning Feature engineering Feature selection
Date
2024-03-16Referencia bibliográfica
Roberto Magán-Carrión, Daniel Urda, Ignacio Diaz-Cano, Bernabé Dorronsoro, Evaluating the impact of different Feature as a Counter data aggregation approaches on the performance of NIDSs and their selected features, Logic Journal of the IGPL, Volume 32, Issue 2, April 2024, Pages 263–280, https://doi.org/10.1093/jigpal/jzae007
Sponsorship
Spanish Ministerio de Ciencia, Innovación y Universidades and the ERDF (RTI2018-100754-B-I00, RTI2018-098160-B-I00 and PID2020-114495RB-I00); ERDF under project FEDER-UCA18-108393 (OPTIMALE); Junta de Andalucía and ERDF (GENIUS–P18-2399); ‘Ayuda de recualificación’ funding by Ministerio de Universidades and the European Union-NextGenerationEU; Project NetSEA-GPT (C-ING-300-UGR23) funded by Consejería de Universidad, Investigación e Innovación and the European Union through the ERDF Andalusia Program 2021-2027Abstract
There is much effort nowadays to protect communication networks against different cybersecurity attacks (which are more and more sophisticated) that look for systems’ vulnerabilities they could exploit for malicious purposes. Network Intrusion Detection Systems (NIDSs) are popular tools to detect and classify such attacks, most of them based on ML models. However, ML-based NIDSs cannot be trained by feeding them with network traffic data as it is. Thus, a Feature Engineering (FE) process plays a crucial role transforming network traffic raw data onto derived one suitable for ML models. In this work, we study the effects of applying one such FE technique in different ways on the performance of two ML models (linear and non-linear) and their selected features. This the Feature as a Counter approach. The derived observations are computed from either with the same number of raw samples, (batch-based approaches) or by aggregating them by time intervals (timestamp-based approach). Results show that there is no significant differences between the proposed approaches neither in the performance of the models nor in the selected features that validate our proposal making it feasible to be widely used as a standard FE method.