A Comparative Analysis of the TDCGAN Model for Data Balancing and Intrusion Detection
Metadatos
Afficher la notice complèteEditorial
MDPI
Materia
data balancing deep learning generative adversarial network
Date
2024-09-12Referencia bibliográfica
Jamoos, M. et. al. Signals 2024, 5(3), 580-596; [https://doi.org/10.3390/signals5030032]
Patrocinador
Deanship of Scientific Research at AlQuds University; Spanish Ministry of Science, Innovation and Universities MICIU/AEI/10.13039/501100011033 and by the European Union NextGenerationEU/ PRTR, under projects TED2021-131699B-I00 and TED2021-129938B-I00; Projects PID2020-113462RB-I00, PID2020-115570GB-C22, and PID2023-147409NB-C21 of the Spanish Ministry of Economy and Competitiveness; Project C-ING-179-UGR23 financed by the “Consejería de Universidades, Investigación e Innovación” (Andalusian Government, FEDER Program 2021–2027)Résumé
Due to the escalating network throughput and security risks, the exploration of intrusion
detection systems (IDSs) has garnered significant attention within the computer science field. The
majority of modern IDSs are constructed using deep learning techniques. Nevertheless, these IDSs
still have shortcomings where most datasets used for IDS lies in their high imbalance, where the
volume of samples representing normal traffic significantly outweighs those representing attack
traffic. This imbalance issue restricts the performance of deep learning classifiers for minority classes,
as it can bias the classifier in favor of the majority class. To address this challenge, many solutions
are proposed in the literature. TDCGAN is an innovative Generative Adversarial Network (GAN)
based on a model-driven approach used to address imbalanced data in the IDS dataset. This paper
investigates the performance of TDCGAN by employing it to balance data across four benchmark IDS
datasets which are CIC-IDS2017, CSE-CIC-IDS2018, KDD-cup 99, and BOT-IOT. Next, four machine
learning methods are employed to classify the data, both on the imbalanced dataset and on the
balanced dataset. A comparison is then conducted between the results obtained from each to identify
the impact of having an imbalanced dataset on classification accuracy. The results demonstrated a
notable enhancement in the classification accuracy for each classifier after the implementation of the
TDCGAN model for data balancing.