MRPR: A MapReduce Solution for Prototype Reduction in Big Data Classification

Triguero, Isaac; Peralta, Daniel; Bacardit, Jaume; García López, Salvador; Herrera Triguero, Francisco

doi:10.1016/j.neucom.2014.04.078

dc.contributor.author	Triguero, Isaac
dc.contributor.author	Peralta, Daniel
dc.contributor.author	Bacardit, Jaume
dc.contributor.author	García López, Salvador
dc.contributor.author	Herrera Triguero, Francisco
dc.date.accessioned	2021-01-21T07:59:12Z
dc.date.available	2021-01-21T07:59:12Z
dc.date.issued	2014-03-03
dc.identifier.citation	Published version: Triguero, I., Peralta, D., Bacardit, J., García, S., & Herrera, F. (2015). MRPR: a MapReduce solution for prototype reduction in big data classification. neurocomputing, 150, 331-345. [https://doi.org/10.1016/j.neucom.2014.04.078]	es_ES
dc.identifier.uri	http://hdl.handle.net/10481/65872
dc.description	Supported by the Research Projects TIN2011-28488, P10-TIC-6858 and P11-TIC-7765. D. Peralta holds an FPU scholarship from the Spanish Ministry of Education and Science (FPU12/04902).	es_ES
dc.description.abstract	In the era of big data, analyzing and extracting knowledge from large-scale data sets is a very interesting and challenging task. The application of standard data mining tools in such data sets is not straightforward. Hence, a new class of scalable mining method that embraces the huge storage and processing capacity of cloud platforms is required. In this work, we propose a novel distributed partitioning methodology for prototype reduction techniques in nearest neighbor classification. These methods aim at representing original training data sets as a reduced number of instances. Their main purposes are to speed up the classification process and reduce the storage requirements and sensitivity to noise of the nearest neighbor rule. However, the standard prototype reduction methods cannot cope with very large data sets. To overcome this limitation, we develop a MapReduce-based framework to distribute the functioning of these algorithms through a cluster of computing elements, proposing several algorithmic strategies to integrate multiple partial solutions (reduced sets of prototypes) into a single one. The proposed model enables prototype reduction algorithms to be applied over big data classification problems without significant accuracy loss. We test the speeding up capabilities of our model with data sets up to 5.7 millions of instances. The results show that this model is a suitable tool to enhance the performance of the nearest neighbor classifier with big data.	es_ES
dc.description.sponsorship	German Research Foundation (DFG) FPU12/04902	es_ES
dc.description.sponsorship	TIN2011-28488	es_ES
dc.description.sponsorship	P10-TIC-6858	es_ES
dc.description.sponsorship	P11-TIC-7765	es_ES
dc.language.iso	eng	es_ES
dc.publisher	Elsevier	es_ES
dc.rights	Atribución-NoComercial-SinDerivadas 3.0 España	*
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/3.0/es/	*
dc.subject	Big Data	es_ES
dc.subject	Mahout	es_ES
dc.subject	Hadoop	es_ES
dc.subject	Prototype reduction	es_ES
dc.subject	Prototype generation	es_ES
dc.subject	Nearest neighbor classification	es_ES
dc.title	MRPR: A MapReduce Solution for Prototype Reduction in Big Data Classification	es_ES
dc.type	journal article	es_ES
dc.rights.accessRights	open access	es_ES
dc.identifier.doi	10.1016/j.neucom.2014.04.078
dc.type.hasVersion	SMUR	es_ES

Fichier(s) constituant ce document

Nom:: triguero-peralta-bacardit-garc ...
Taille:: 432.8Ko
Format:: PDF

Ce document figure dans la(les) collection(s) suivante(s)

DCCIA - Artículos

Afficher la notice abrégée

Excepté là où spécifié autrement, la license de ce document est décrite en tant que Atribución-NoComercial-SinDerivadas 3.0 España