EDRS: Extremity-density representative selection for semi-supervised learning on imbalanced data

Durán López, Alberto; Bolaños Martinez, Daniel; Bermúdez Edo, María del Campo

doi:10.1016/j.ins.2026.123390

dc.contributor.author	Durán López, Alberto
dc.contributor.author	Bolaños Martinez, Daniel
dc.contributor.author	Bermúdez Edo, María del Campo
dc.date.accessioned	2026-03-23T07:47:24Z
dc.date.available	2026-03-23T07:47:24Z
dc.date.issued	2026-03-19
dc.identifier.citation	Durán-López, A., Bolaños-Martinez, D., & Bermudez-Edo, M. (2026). EDRS: Extremity-density representative selection for semi-supervised learning on imbalanced data. Information Sciences, 744(123390), 1-18. https://doi.org/10.1016/j.ins.2026.123390	es_ES
dc.identifier.uri	https://hdl.handle.net/10481/112360
dc.description	This work was supported by Grant C-SEJ-128-UGR23, funded by Consejería de Universidad, Investigación e Innovación and by ERDF Andalusia Program 2021-2027; project PID2023-149185OBI00 funded by MICIU/AEI/10.13039/501100011033 and by ERDF/EU. Funding for open access charge: Universidad de Granada/CBUA.	es_ES
dc.description.abstract	Representative sample selection improves training in semi-supervised learning (SSL) where labeled data are limited and must reflect the original dataset. Recent SSL methods ignore class imbalance and lack tabular data case studies. To fill this gap, we propose Extremity-Density Representative Selection (EDRS), a preprocessing point selection method for imbalanced tabular datasets. EDRS ranks unlabeled candidates by combining two scores: density, which favors regions with many individuals, and extremity, which ensures inclusion of extreme cases likely belonging to minority classes. We first cluster the data to ensure diverse and representative coverage of the space, and then select samples with the highest density and extremity values, balancing outlier avoidance with coverage of extreme values. EDRS is used to select samples for labeling in an SSL framework and is compared with Random Sampling, Stratified Sampling, K-Means–derived methods, USL, Hybrid-CEAL, FDMat, Gaussian Mapping, and ESC-FFS. We validate EDRS on twelve synthetic and six real-world imbalanced datasets using SSL VIME, Manifold Mixup and Contrastive Mixup. EDRS achieves a class imbalance ratio (IR) close to 1 and is 99% faster than other algorithms with similar IR, improves F1-score by 3–5% in well-separated classes, and includes an ablation test evaluating the impact of density and extremity.	es_ES
dc.description.sponsorship	Consejería de Universidad, Investigación e Innovación/ERDF Andalusia C-SEJ-128-UGR23	es_ES
dc.description.sponsorship	MICIU/AEI/10.13039/501100011033/ERDF/EU PID2023-149185OBI00	es_ES
dc.description.sponsorship	Universidad de Granada/CBUA	es_ES
dc.language.iso	eng	es_ES
dc.publisher	Elsevier	es_ES
dc.rights	Attribution-NonCommercial-NoDerivatives 4.0 Internacional	*
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/4.0/	*
dc.subject	Semi-supervised learning	es_ES
dc.subject	Internet of things (IoT)	es_ES
dc.subject	Imbalanced classes	es_ES
dc.subject	Representative sample selection	es_ES
dc.subject	Tabular data	es_ES
dc.title	EDRS: Extremity-density representative selection for semi-supervised learning on imbalanced data	es_ES
dc.type	journal article	es_ES
dc.rights.accessRights	open access	es_ES
dc.identifier.doi	10.1016/j.ins.2026.123390
dc.type.hasVersion	VoR	es_ES

Fichier(s) constituant ce document

Nom:: EDRS Extremity-density represe ...
Taille:: 3.569Mo
Format:: PDF
Description:: Articulo principal

Ce document figure dans la(les) collection(s) suivante(s)

DLSI - Artículos

Afficher la notice abrégée

Excepté là où spécifié autrement, la license de ce document est décrite en tant que Attribution-NonCommercial-NoDerivatives 4.0 Internacional