EDRS: Extremity-density representative selection for semi-supervised learning on imbalanced data
Metadata
Show full item recordEditorial
Elsevier
Materia
Semi-supervised learning Internet of things (IoT) Imbalanced classes Representative sample selection Tabular data
Date
2026-03-19Referencia bibliográfica
Durán-López, A., Bolaños-Martinez, D., & Bermudez-Edo, M. (2026). EDRS: Extremity-density representative selection for semi-supervised learning on imbalanced data. Information Sciences, 744(123390), 1-18. https://doi.org/10.1016/j.ins.2026.123390
Sponsorship
Consejería de Universidad, Investigación e Innovación/ERDF Andalusia C-SEJ-128-UGR23; MICIU/AEI/10.13039/501100011033/ERDF/EU PID2023-149185OBI00; Universidad de Granada/CBUAAbstract
Representative sample selection improves training in semi-supervised learning (SSL) where labeled data are limited and must reflect the original dataset. Recent SSL methods ignore class imbalance and lack tabular data case studies. To fill this gap, we propose Extremity-Density Representative Selection (EDRS), a preprocessing point selection method for imbalanced tabular datasets. EDRS ranks unlabeled candidates by combining two scores: density, which favors regions with many individuals, and extremity, which ensures inclusion of extreme cases likely belonging to minority classes. We first cluster the data to ensure diverse and representative coverage of the space, and then select samples with the highest density and extremity values, balancing outlier avoidance with coverage of extreme values. EDRS is used to select samples for labeling in an SSL framework and is compared with Random Sampling, Stratified Sampling, K-Means–derived methods, USL, Hybrid-CEAL, FDMat, Gaussian Mapping, and ESC-FFS. We validate EDRS on twelve synthetic and six real-world imbalanced datasets using SSL VIME, Manifold Mixup and Contrastive Mixup. EDRS achieves a class imbalance ratio (IR) close to 1 and is 99% faster than other algorithms with similar IR, improves F1-score by 3–5% in well-separated classes, and includes an ablation test evaluating the impact of density and extremity.





