On the Suitability of Bagging-Based Ensembles with Borderline Label Noise
Metadatos
Mostrar el registro completo del ítemEditorial
MDPI
Materia
Borderline noise Label noise Bagging Ensembles Robust learners Classification
Fecha
2022-06-01Referencia bibliográfica
Sáez, J.A; Romero-Béjar, J.L. On the Suitability of Bagging-Based Ensembles with Borderline Label Noise. Mathematics 2022, 10, 1892. [https://doi.org/10.3390/math10111892]
Patrocinador
MCIU/AEI/ERDF, UE PGC2018-098860-B-I00; ERDF Operational Programme 2014-2020 A-FQM-345-UGR18; Economy and Knowledge Council of the Regional Government of Andalusia, Spain; MCIN/AEI CEX2020-001105-MResumen
Real-world classification data usually contain noise, which can affect the accuracy of the
models and their complexity. In this context, an interesting approach to reduce the effects of noise is
building ensembles of classifiers, which traditionally have been credited with the ability to tackle
difficult problems. Among the alternatives to build ensembles with noisy data, bagging has shown
some potential in the specialized literature. However, existing works in this field are limited and
only focus on the study of noise based on a random mislabeling, which is unlikely to occur in
real-world applications. Recent research shows that other types of noise, such as that occurring at
class boundaries, are more common and challenging for classification algorithms. This paper delves
into the analysis of the usage of bagging techniques in these complex problems, in which noise affects
the decision boundaries among classes. In order to investigate whether bagging is able to reduce
the impact of borderline noise, an experimental study is carried out considering a large number
of datasets with different noise levels, and several noise models and classification algorithms. The
results obtained reflect that bagging obtains a better accuracy and robustness than the individual
models with this complex type of noise. The highest improvements in average accuracy are around
2–4% and are generally found at medium-high noise levels (from 15–20% onwards). The partial
consideration of noisy samples when creating the subsamples from the original training set in bagging
can make it so that only some parts of the decision boundaries among classes are impaired when
building each model, reducing the impact of noise in the global system.