Big data preprocessing: enabling smart data
Identificadores
URI: https://hdl.handle.net/10481/99405Metadatos
Mostrar el registro completo del ítemAutor
Luengo Martín, Julián; García Gil, Diego Jesús; Ramírez-Gallego, Sergio; García López, Salvador; Herrera Triguero, FranciscoEditorial
Springer Cham
Materia
Big Data Machine Learning Information Systems and Communication Service
Fecha
2020-03-16Referencia bibliográfica
Luengo, J., García-Gil, D., Ramírez-Gallego, S., García, S., & Herrera, F. (2020). Big data preprocessing. Cham: Springer.
Resumen
The massive growth in the scale of data has been observed in recent years, being
a key factor of the Big Data scenario. Big Data can be defined as high volume,
velocity, and variety of data that require a new high-performance processing.
Addressing Big Data is a challenging and time-demanding task that requires a
large computational infrastructure to ensure successful data processing and analysis.
Being a very common scenario in real-life applications, the interest of researchers
and practitioners on the topic has grown significantly during these years. Among Big
Data disciplines, data mining is a key topic, enabling the user to extract knowledge
from enormous amounts of raw data. However, this raw data is not always in the best
condition to be treated, analyzed, and surveyed. The application of preprocessing
techniques is a must in real-world applications, to ensure quality data, Smart Data,
for a proper treatment and analysis. The term Smart Data refers to the challenge of
transforming raw data into quality data that can be appropriately exploited to obtain
valuable insights.
This book aims at offering a general and comprehensible overview of data
preprocessing in Big Data, enabling Smart Data. It contains a comprehensive
description of the topic and focuses on its main features and the most relevant
proposed solutions. Additionally, it considers the different scenarios in Big Data for
which the application of data preprocessing techniques can suppose a real challenge.
Data preprocessing is a multifaceted discipline that includes data preparation,
compounded by integration, cleaning, normalization, and transformation of data;
data reduction tasks such as feature selection, instance selection, and discretization;
and resampling techniques to deal with imbalanced data.
This book stresses the gap with standard data preprocessing techniques and their
Big Data equivalents, showing the challenging difficulties in their development
for the latter. It also covers the different approaches that have been traditionally
applied and the latest proposals in Big Data preprocessing. Specifically, it reviews
data reduction methods, imperfect data approaches, discretization techniques, and imbalanced data preprocessing solutions. Finally, this book describes the most popular
Big Data libraries for machine learning, focusing on their data preprocessing
algorithms and utilities.