<rdf:RDF xmlns:rdf="http://www.openarchives.org/OAI/2.0/rdf/" xmlns:ow="http://www.ontoweb.org/ontology/1#" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:ds="http://dspace.org/ds/elements/1.1/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:doc="http://www.lyncode.com/xoai" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/rdf/ http://www.openarchives.org/OAI/2.0/rdf.xsd">
   <ow:Publication rdf:about="oai:digibug.ugr.es:10481/64590">
      <dc:title>Big Data Preprocessing as the Bridge between Big Data and Smart Data: BigDaPSpark and BigDaPFlink Libraries</dc:title>
      <dc:creator>García Gil, Diego Jesús</dc:creator>
      <dc:creator>Alcalde Barros, Alejandro</dc:creator>
      <dc:creator>Luengo Martín, Julián</dc:creator>
      <dc:creator>García López, Salvador</dc:creator>
      <dc:creator>Herrera Triguero, Francisco</dc:creator>
      <dc:subject>Big Data</dc:subject>
      <dc:subject>Apache spark</dc:subject>
      <dc:subject>Data Preprocessing</dc:subject>
      <dc:subject>Smart Data</dc:subject>
      <dc:subject>Imbalanced</dc:subject>
      <dc:subject>Classification</dc:subject>
      <dc:description>With the advent of Big Data, terabytes of data are generated and stored every second. This raw data is far from&#xd;
being perfect, it contains many imperfections (noise, missing values, etc.) and is not suitable for analysis,&#xd;
as it will led to wrong conclusions. Data preprocessing is the set of techniques devoted to polish, clean,&#xd;
fix, and improve that raw data. With this preprocessed data, we would be able to find more patterns in it,&#xd;
and to better explain the underlaying distribution of the data. This is what is called Smart Data, raw data&#xd;
that has been preprocessed and is ready for being analyzed, data that contains valuable information that will&#xd;
led to knowledge. In this work, we present two Big Data libraries for achieving Smart Data from Big Data,&#xd;
BigDaPSpark and BigDaPFlink. They are built on top of two Big Data frameworks, Apache Spark and Apache&#xd;
Flink. Both libraries contain a series of algorithms for Big Data preprocessing, ranging from noise cleaning,&#xd;
to discretization, or data reduction, among many others. Additionally, we ilustrate the usage of the libraries&#xd;
with two cases of use.</dc:description>
      <dc:date>2020-12-02T11:07:28Z</dc:date>
      <dc:date>2020-12-02T11:07:28Z</dc:date>
      <dc:date>2019</dc:date>
      <dc:type>journal article</dc:type>
      <dc:identifier>García-Gil, D., Alcalde-Barros, A., Luengo, J., García, S., &amp; Herrera, F. (2019). Big Data Preprocessing as the Bridge between Big Data and Smart Data: BigDaPSpark and BigDaPFlink Libraries. In IoTBDS (pp. 324-331). [DOI: 10.5220/0007738503240331]</dc:identifier>
      <dc:identifier>http://hdl.handle.net/10481/64590</dc:identifier>
      <dc:identifier>10.5220/0007738503240331</dc:identifier>
      <dc:language>eng</dc:language>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/3.0/es/</dc:rights>
      <dc:rights>open access</dc:rights>
      <dc:rights>Atribución-NoComercial-SinDerivadas 3.0 España</dc:rights>
      <dc:publisher>ScitePress</dc:publisher>
   </ow:Publication>
</rdf:RDF>