Big data preprocessing: enabling smart data

Luengo Martín, Julián; García Gil, Diego Jesús; Ramírez-Gallego, Sergio; García López, Salvador; Herrera Triguero, Francisco

doi:https://doi.org/10.1007/978-3-030-39105-8

2020_Book_BigDataPreprocessing.pdf (3.709Mb)

Identificadores

URI: https://hdl.handle.net/10481/99405

DOI: https://doi.org/10.1007/978-3-030-39105-8

Exportar

Editorial

Springer Cham

Materia

Big Data

Machine Learning

Information Systems and Communication Service

Fecha

2020-03-16

Referencia bibliográfica

Luengo, J., García-Gil, D., Ramírez-Gallego, S., García, S., & Herrera, F. (2020). Big data preprocessing. Cham: Springer.

Resumen

The massive growth in the scale of data has been observed in recent years, being a key factor of the Big Data scenario. Big Data can be defined as high volume, velocity, and variety of data that require a new high-performance processing. Addressing Big Data is a challenging and time-demanding task that requires a large computational infrastructure to ensure successful data processing and analysis. Being a very common scenario in real-life applications, the interest of researchers and practitioners on the topic has grown significantly during these years. Among Big Data disciplines, data mining is a key topic, enabling the user to extract knowledge from enormous amounts of raw data. However, this raw data is not always in the best condition to be treated, analyzed, and surveyed. The application of preprocessing techniques is a must in real-world applications, to ensure quality data, Smart Data, for a proper treatment and analysis. The term Smart Data refers to the challenge of transforming raw data into quality data that can be appropriately exploited to obtain valuable insights. This book aims at offering a general and comprehensible overview of data preprocessing in Big Data, enabling Smart Data. It contains a comprehensive description of the topic and focuses on its main features and the most relevant proposed solutions. Additionally, it considers the different scenarios in Big Data for which the application of data preprocessing techniques can suppose a real challenge. Data preprocessing is a multifaceted discipline that includes data preparation, compounded by integration, cleaning, normalization, and transformation of data; data reduction tasks such as feature selection, instance selection, and discretization; and resampling techniques to deal with imbalanced data. This book stresses the gap with standard data preprocessing techniques and their Big Data equivalents, showing the challenging difficulties in their development for the latter. It also covers the different approaches that have been traditionally applied and the latest proposals in Big Data preprocessing. Specifically, it reviews data reduction methods, imperfect data approaches, discretization techniques, and imbalanced data preprocessing solutions. Finally, this book describes the most popular Big Data libraries for machine learning, focusing on their data preprocessing algorithms and utilities.

Colecciones

SCI2S - Libros

Excepto si se señala otra cosa, la licencia del ítem se describe como Attribution-NonCommercial-NoDerivatives 4.0 Internacional