Transforming big data into smart data: An insight on the use of the k‐nearest neighbors algorithm to obtain quality data

Triguero, Isaac; García Gil, Diego Jesús; Maillo, Jesús; Luengo Martín, Julián; García López, Salvador; Herrera Triguero, Francisco

doi:https://doi.org/10.1002/widm.1289

dc.contributor.author	Triguero, Isaac
dc.contributor.author	García Gil, Diego Jesús
dc.contributor.author	Maillo, Jesús
dc.contributor.author	Luengo Martín, Julián
dc.contributor.author	García López, Salvador
dc.contributor.author	Herrera Triguero, Francisco
dc.date.accessioned	2025-01-16T10:54:21Z
dc.date.available	2025-01-16T10:54:21Z
dc.date.issued	2018-11-28
dc.identifier.citation	Triguero, I., García‐Gil, D., Maillo, J., Luengo, J., García, S., & Herrera, F. (2019). Transforming big data into smart data: An insight on the use of the k‐nearest neighbors algorithm to obtain quality data. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 9(2), e1289.	es_ES
dc.identifier.uri	https://hdl.handle.net/10481/99393
dc.description.abstract	The k-nearest neighbors algorithm is characterized as a simple yet effective data mining technique. The main drawback of this technique appears when massive amounts of data—likely to contain noise and imperfections—are involved, turning this algorithm into an imprecise and especially inefficient technique. These disadvantages have been subject of research for many years, and among others approaches, data preprocessing techniques such as instance reduction or missing values imputation have targeted these weaknesses. As a result, these issues have turned out as strengths and the k-nearest neighbors rule has become a core algorithm to identify and correct imperfect data, removing noisy and redundant samples, or imputing missing values, transforming Big Data into Smart Data—which is data of sufficient quality to expect a good outcome from any data mining algorithm. The role of this smart data gleaning algorithm in a supervised learning context are investigated. This includes a brief overview of Smart Data, current and future trends for the k-nearest neighbor algorithm in the Big Data context, and the existing data preprocessing techniques based on this algorithm. We present the emerging big data-ready versions of these algorithms and develop some new methods to cope with Big Data. We carry out a thorough experimental analysis in a series of big datasets that provide guidelines as to how to use the k-nearest neighbor algorithm to obtain Smart/Quality Data for a high-quality data mining process. Moreover, multiple Spark Packages have been developed including all the Smart Data algorithms analyzed.	es_ES
dc.description.sponsorship	This work is supported by the Spanish National Research Project TIN2017-89517-P and the Foundation BBVA project 75/2016 BigDaP-TOOLS—“Ayudas Fundación BBVA a Equipos de Investigación Científica 2016”. J. Maillo holds a FPU scholarship from the Spanish Ministry of Education.	es_ES
dc.language.iso	eng	es_ES
dc.publisher	Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery	es_ES
dc.rights	Attribution-NonCommercial-NoDerivatives 4.0 Internacional	*
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/4.0/	*
dc.subject	big data	es_ES
dc.subject	data preprocessing	es_ES
dc.subject	instance reduction	es_ES
dc.subject	K nearest neighbours	es_ES
dc.subject	imperfect data	es_ES
dc.subject	smart data	es_ES
dc.subject	instance reduction	es_ES
dc.subject	spark	es_ES
dc.title	Transforming big data into smart data: An insight on the use of the k‐nearest neighbors algorithm to obtain quality data	es_ES
dc.type	journal article	es_ES
dc.rights.accessRights	open access	es_ES
dc.identifier.doi	https://doi.org/10.1002/widm.1289
dc.type.hasVersion	AM	es_ES

Ficheros en el ítem

Nombre:: Triguero_et_al-2019-Wiley_Inte ...
Tamaño:: 3.217Mb
Formato:: PDF

Este ítem aparece en la(s) siguiente(s) colección(ones)

SCI2S - Artículos

Mostrar el registro sencillo del ítem

Excepto si se señala otra cosa, la licencia del ítem se describe como Attribution-NonCommercial-NoDerivatives 4.0 Internacional