A Comparative Analysis of the TDCGAN Model for Data Balancing and Intrusion Detection

Jamoos, Mohammad; Mora García, Antonio Miguel; AlKhanafseh, Mohammad; Surakhi, Ola

doi:10.3390/signals5030032

signals-05-00032.pdf (732.6Ko)

Identificadores

URI: https://hdl.handle.net/10481/95996

DOI: 10.3390/signals5030032

Exportar

Editorial

MDPI

Materia

data balancing

deep learning

generative adversarial network

Date

2024-09-12

Referencia bibliográfica

Jamoos, M. et. al. Signals 2024, 5(3), 580-596; [https://doi.org/10.3390/signals5030032]

Patrocinador

Deanship of Scientific Research at AlQuds University; Spanish Ministry of Science, Innovation and Universities MICIU/AEI/10.13039/501100011033 and by the European Union NextGenerationEU/ PRTR, under projects TED2021-131699B-I00 and TED2021-129938B-I00; Projects PID2020-113462RB-I00, PID2020-115570GB-C22, and PID2023-147409NB-C21 of the Spanish Ministry of Economy and Competitiveness; Project C-ING-179-UGR23 financed by the “Consejería de Universidades, Investigación e Innovación” (Andalusian Government, FEDER Program 2021–2027)

Résumé

Due to the escalating network throughput and security risks, the exploration of intrusion detection systems (IDSs) has garnered significant attention within the computer science field. The majority of modern IDSs are constructed using deep learning techniques. Nevertheless, these IDSs still have shortcomings where most datasets used for IDS lies in their high imbalance, where the volume of samples representing normal traffic significantly outweighs those representing attack traffic. This imbalance issue restricts the performance of deep learning classifiers for minority classes, as it can bias the classifier in favor of the majority class. To address this challenge, many solutions are proposed in the literature. TDCGAN is an innovative Generative Adversarial Network (GAN) based on a model-driven approach used to address imbalanced data in the IDS dataset. This paper investigates the performance of TDCGAN by employing it to balance data across four benchmark IDS datasets which are CIC-IDS2017, CSE-CIC-IDS2018, KDD-cup 99, and BOT-IOT. Next, four machine learning methods are employed to classify the data, both on the imbalanced dataset and on the balanced dataset. A comparison is then conducted between the results obtained from each to identify the impact of having an imbalanced dataset on classification accuracy. The results demonstrated a notable enhancement in the classification accuracy for each classifier after the implementation of the TDCGAN model for data balancing.

Colecciones

OpenAIRE (Open Access Infrastructure for Research in Europe)

Excepté là où spécifié autrement, la license de ce document est décrite en tant que Atribución 4.0 Internacional