<rdf:RDF xmlns:rdf="http://www.openarchives.org/OAI/2.0/rdf/" xmlns:ow="http://www.ontoweb.org/ontology/1#" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:ds="http://dspace.org/ds/elements/1.1/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:doc="http://www.lyncode.com/xoai" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/rdf/ http://www.openarchives.org/OAI/2.0/rdf.xsd">
   <ow:Publication rdf:about="oai:digibug.ugr.es:10481/70391">
      <dc:title>FDR2 -BD: A Fast Data Reduction Recommendation Tool for Tabular Big Data Classification Problems</dc:title>
      <dc:creator>Basgall, María José</dc:creator>
      <dc:creator>Naiouf, Marcelo</dc:creator>
      <dc:creator>Fernández Hilario, Alberto Luis</dc:creator>
      <dc:subject>Big Data</dc:subject>
      <dc:subject>Data reduction</dc:subject>
      <dc:subject>Classification</dc:subject>
      <dc:subject>Preprocessing techniques</dc:subject>
      <dc:subject>Apache spark</dc:subject>
      <dc:description>In this paper, a methodological data condensation approach for reducing tabular big&#xd;
datasets in classification problems is presented, named FDR2&#xd;
-BD. The key of our proposal is to&#xd;
analyze data in a dual way (vertical and horizontal), so as to provide a smart combination between&#xd;
feature selection to generate dense clusters of data and uniform sampling reduction to keep only&#xd;
a few representative samples from each problem area. Its main advantage is allowing the model’s&#xd;
predictive quality to be kept in a range determined by a user’s threshold. Its robustness is built on a&#xd;
hyper-parametrization process, in which all data are taken into consideration by following a k-fold&#xd;
procedure. Another significant capability is being fast and scalable by using fully optimized parallel&#xd;
operations provided by Apache Spark. An extensive experimental study is performed over 25 big&#xd;
datasets with different characteristics. In most cases, the obtained reduction percentages are above&#xd;
95%, thus outperforming state-of-the-art solutions such as FCNN_MR that barely reach 70%. The&#xd;
most promising outcome is maintaining the representativeness of the original data information, with&#xd;
quality prediction values around 1% of the baseline.</dc:description>
      <dc:date>2021-09-23T08:13:55Z</dc:date>
      <dc:date>2021-09-23T08:13:55Z</dc:date>
      <dc:date>2021</dc:date>
      <dc:type>info:eu-repo/semantics/article</dc:type>
      <dc:identifier>Basgall, M.J.; Naiouf, M.; Fernández, A. FDR2 -BD: A Fast Data Reduction Recommendation Tool for Tabular Big Data Classification Problems. Electronics 2021, 10, 1757. https://doi.org/10.3390/ electronics10151757</dc:identifier>
      <dc:identifier>http://hdl.handle.net/10481/70391</dc:identifier>
      <dc:identifier>10.3390/electronics10151757</dc:identifier>
      <dc:language>eng</dc:language>
      <dc:rights>http://creativecommons.org/licenses/by/3.0/es/</dc:rights>
      <dc:rights>info:eu-repo/semantics/openAccess</dc:rights>
      <dc:rights>Atribución 3.0 España</dc:rights>
      <dc:publisher>MDPI</dc:publisher>
   </ow:Publication>
</rdf:RDF>