UniBFS: A novel uniform-solution-driven binary feature selection algorithm for high-dimensional data
Metadatos
Mostrar el registro completo del ítemAutor
Ahadzadeh, Behrouz; Abdar, Moloud; Foroumandi, Mahdieh; Safara, Fatemeh; Khosravi, Abbas; García López, Salvador; Nagaratnam Suganthan, PonnuthuraiEditorial
Elsevier
Materia
High-dimensional data classification Evolutionary algorithms for selection Swarm algorithms for selection
Fecha
2024-09-06Referencia bibliográfica
Ahadzadeh, B. et. al. Swarm and Evolutionary Computation 91 (2024) 101715. [https://doi.org/10.1016/j.swevo.2024.101715]
Patrocinador
Qatar National LibraryResumen
Feature selection (FS) is a crucial technique in machine learning and data mining, serving a variety of purposes
such as simplifying model construction, facilitating knowledge discovery, improving computational efficiency,
and reducing memory consumption. Despite its importance, the constantly increasing search space of highdimensional
datasets poses significant challenges to FS methods, including issues like the "curse of dimensionality,"
susceptibility to local optima, and high computational and memory costs. To overcome these challenges, a
new FS algorithm named Uniform-solution-driven Binary Feature Selection (UniBFS) has been developed in this
study. UniBFS exploits the inherent characteristic of binary algorithms-binary coding-to search the entire
problem space for identifying relevant features while avoiding irrelevant ones. To improve the effectiveness and
efficiency of the UniBFS algorithm, Redundant Features Elimination algorithm (RFE) is presented in this paper.
RFE performs a local search in a very small subspace of the solutions obtained by UniBFS in different stages, and
removes the redundant features which do not increase the classification accuracy. Moreover, the study proposes
a hybrid algorithm that combines UniBFS with two filter-based FS methods, ReliefF and Fisher, to identify
pertinent features during the global search phase. The proposed algorithms are evaluated on 30 high-dimensional
datasets ranging from 2000 to 54676 dimensions, and their effectiveness and efficiency are compared with stateof-
the-art techniques, demonstrating their superiority.





