| dc.contributor.author | Basgall, María José | |
| dc.contributor.author | Naiouf, Marcelo | |
| dc.contributor.author | Fernández Hilario, Alberto Luis | |
| dc.date.accessioned | 2021-09-23T08:13:55Z | |
| dc.date.available | 2021-09-23T08:13:55Z | |
| dc.date.issued | 2021 | |
| dc.identifier.citation | Basgall, M.J.; Naiouf, M.; Fernández, A. FDR2 -BD: A Fast Data Reduction Recommendation Tool for Tabular Big Data Classification Problems. Electronics 2021, 10, 1757. https://doi.org/10.3390/ electronics10151757 | es_ES |
| dc.identifier.uri | http://hdl.handle.net/10481/70391 | |
| dc.description.abstract | In this paper, a methodological data condensation approach for reducing tabular big
datasets in classification problems is presented, named FDR2
-BD. The key of our proposal is to
analyze data in a dual way (vertical and horizontal), so as to provide a smart combination between
feature selection to generate dense clusters of data and uniform sampling reduction to keep only
a few representative samples from each problem area. Its main advantage is allowing the model’s
predictive quality to be kept in a range determined by a user’s threshold. Its robustness is built on a
hyper-parametrization process, in which all data are taken into consideration by following a k-fold
procedure. Another significant capability is being fast and scalable by using fully optimized parallel
operations provided by Apache Spark. An extensive experimental study is performed over 25 big
datasets with different characteristics. In most cases, the obtained reduction percentages are above
95%, thus outperforming state-of-the-art solutions such as FCNN_MR that barely reach 70%. The
most promising outcome is maintaining the representativeness of the original data information, with
quality prediction values around 1% of the baseline. | es_ES |
| dc.language.iso | eng | es_ES |
| dc.publisher | MDPI | es_ES |
| dc.rights | Atribución 3.0 España | * |
| dc.rights.uri | http://creativecommons.org/licenses/by/3.0/es/ | * |
| dc.subject | Big Data | es_ES |
| dc.subject | Data reduction | es_ES |
| dc.subject | Classification | es_ES |
| dc.subject | Preprocessing techniques | es_ES |
| dc.subject | Apache spark | es_ES |
| dc.title | FDR2 -BD: A Fast Data Reduction Recommendation Tool for Tabular Big Data Classification Problems | es_ES |
| dc.type | journal article | es_ES |
| dc.rights.accessRights | open access | es_ES |
| dc.identifier.doi | 10.3390/electronics10151757 | |