Mostrar el registro sencillo del ítem

dc.contributor.authorBasgall, María José
dc.contributor.authorNaiouf, Marcelo
dc.contributor.authorFernández Hilario, Alberto Luis 
dc.date.accessioned2021-09-23T08:13:55Z
dc.date.available2021-09-23T08:13:55Z
dc.date.issued2021
dc.identifier.citationBasgall, M.J.; Naiouf, M.; Fernández, A. FDR2 -BD: A Fast Data Reduction Recommendation Tool for Tabular Big Data Classification Problems. Electronics 2021, 10, 1757. https://doi.org/10.3390/ electronics10151757es_ES
dc.identifier.urihttp://hdl.handle.net/10481/70391
dc.description.abstractIn this paper, a methodological data condensation approach for reducing tabular big datasets in classification problems is presented, named FDR2 -BD. The key of our proposal is to analyze data in a dual way (vertical and horizontal), so as to provide a smart combination between feature selection to generate dense clusters of data and uniform sampling reduction to keep only a few representative samples from each problem area. Its main advantage is allowing the model’s predictive quality to be kept in a range determined by a user’s threshold. Its robustness is built on a hyper-parametrization process, in which all data are taken into consideration by following a k-fold procedure. Another significant capability is being fast and scalable by using fully optimized parallel operations provided by Apache Spark. An extensive experimental study is performed over 25 big datasets with different characteristics. In most cases, the obtained reduction percentages are above 95%, thus outperforming state-of-the-art solutions such as FCNN_MR that barely reach 70%. The most promising outcome is maintaining the representativeness of the original data information, with quality prediction values around 1% of the baseline.es_ES
dc.language.isoenges_ES
dc.publisherMDPIes_ES
dc.rightsAtribución 3.0 España*
dc.rights.urihttp://creativecommons.org/licenses/by/3.0/es/*
dc.subjectBig Dataes_ES
dc.subjectData reductiones_ES
dc.subjectClassification es_ES
dc.subjectPreprocessing techniqueses_ES
dc.subjectApache sparkes_ES
dc.titleFDR2 -BD: A Fast Data Reduction Recommendation Tool for Tabular Big Data Classification Problemses_ES
dc.typeinfo:eu-repo/semantics/articlees_ES
dc.rights.accessRightsinfo:eu-repo/semantics/openAccesses_ES
dc.identifier.doi10.3390/electronics10151757


Ficheros en el ítem

[PDF]

Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem

Atribución 3.0 España
Excepto si se señala otra cosa, la licencia del ítem se describe como Atribución 3.0 España