UniBFS: A novel uniform-solution-driven binary feature selection algorithm for high-dimensional data

Ahadzadeh, Behrouz; Abdar, Moloud; Foroumandi, Mahdieh; Safara, Fatemeh; Khosravi, Abbas; García López, Salvador; Nagaratnam Suganthan, Ponnuthurai

doi:10.1016/j.swevo.2024.101715

1-s2.0-S2210650224002530-main.pdf (4.426Mb)

Identificadores

URI: https://hdl.handle.net/10481/96275

DOI: 10.1016/j.swevo.2024.101715

Exportar

Editorial

Elsevier

Materia

High-dimensional data classification

Evolutionary algorithms for selection

Swarm algorithms for selection

Fecha

2024-09-06

Referencia bibliográfica

Ahadzadeh, B. et. al. Swarm and Evolutionary Computation 91 (2024) 101715. [https://doi.org/10.1016/j.swevo.2024.101715]

Patrocinador

Qatar National Library

Resumen

Feature selection (FS) is a crucial technique in machine learning and data mining, serving a variety of purposes such as simplifying model construction, facilitating knowledge discovery, improving computational efficiency, and reducing memory consumption. Despite its importance, the constantly increasing search space of highdimensional datasets poses significant challenges to FS methods, including issues like the "curse of dimensionality," susceptibility to local optima, and high computational and memory costs. To overcome these challenges, a new FS algorithm named Uniform-solution-driven Binary Feature Selection (UniBFS) has been developed in this study. UniBFS exploits the inherent characteristic of binary algorithms-binary coding-to search the entire problem space for identifying relevant features while avoiding irrelevant ones. To improve the effectiveness and efficiency of the UniBFS algorithm, Redundant Features Elimination algorithm (RFE) is presented in this paper. RFE performs a local search in a very small subspace of the solutions obtained by UniBFS in different stages, and removes the redundant features which do not increase the classification accuracy. Moreover, the study proposes a hybrid algorithm that combines UniBFS with two filter-based FS methods, ReliefF and Fisher, to identify pertinent features during the global search phase. The proposed algorithms are evaluated on 30 high-dimensional datasets ranging from 2000 to 54676 dimensions, and their effectiveness and efficiency are compared with stateof- the-art techniques, demonstrating their superiority.

Colecciones

DCCIA - Artículos

Excepto si se señala otra cosa, la licencia del ítem se describe como Atribución 4.0 Internacional