Evolutionary Feature Selection for Big Data Classification: A MapReduce Approach

Peralta, Daniel; Río García, Sara del; Ramírez-Gallego, Sergio; Triguero, Isaac; Benítez Sánchez, José Manuel; Herrera Triguero, Francisco

doi:10.1155/2015/246139

dc.contributor.author	Peralta, Daniel
dc.contributor.author	Río García, Sara del
dc.contributor.author	Ramírez-Gallego, Sergio
dc.contributor.author	Triguero, Isaac
dc.contributor.author	Benítez Sánchez, José Manuel
dc.contributor.author	Herrera Triguero, Francisco
dc.date.accessioned	2015-12-09T12:53:57Z
dc.date.available	2015-12-09T12:53:57Z
dc.date.issued	2015
dc.identifier.citation	Peralta, D.; et al. Evolutionary Feature Selection for Big Data Classification: A MapReduce Approach. Mathematical Problems in Engineering, 2015: 246139 (2015). [doi: 10.1155/2015/246139]	es_ES
dc.identifier.issn	1024-123X
dc.identifier.issn	1563-5147
dc.identifier.uri	http://hdl.handle.net/10481/39134
dc.description.abstract	Nowadays, many disciplines have to deal with big datasets that additionally involve a high number of features. Feature selection methods aim at eliminating noisy, redundant, or irrelevant features that may deteriorate the classification performance. However, traditional methods lack enough scalability to cope with datasets of millions of instances and extract successful results in a delimited time. This paper presents a feature selection algorithm based on evolutionary computation that uses the MapReduce paradigm to obtain subsets of features from big datasets. The algorithm decomposes the original dataset in blocks of instances to learn from them in the map phase; then, the reduce phase merges the obtained partial results into a final vector of feature weights, which allows a flexible application of the feature selection procedure using a threshold to determine the selected subset of features. The feature selection method is evaluated by using three well-known classifiers (SVM, Logistic Regression, and Naive Bayes) implemented within the Spark framework to address big data problems. In the experiments, datasets up to 67 millions of instances and up to 2000 attributes have been managed, showing that this is a suitable framework to perform evolutionary feature selection, improving both the classification accuracy and its runtime when dealing with big data problems.	es_ES
dc.description.sponsorship	This work is supported by the Research Projects TIN2014-57251-P, P10-TIC-6858, P11-TIC-7765, P12-TIC-2958, and TIN2013-47210-P. D. Peralta and S. Ramírez-Gallego hold two FPU scholarships from the Spanish Ministry of Education and Science (FPU12/04902, FPU13/00047). I. Triguero holds a BOF postdoctoral fellowship from the Ghent University.	es_ES
dc.language.iso	eng	es_ES
dc.publisher	Hindawi Publishing Corporation	es_ES
dc.rights	Creative Commons Attribution-NonCommercial-NoDerivs 3.0 License	es_ES
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/3.0/	es_ES
dc.subject	Algorithm	es_ES
dc.subject	Datasets	es_ES
dc.subject	Classification	es_ES
dc.subject	Instance selection	es_ES
dc.title	Evolutionary Feature Selection for Big Data Classification: A MapReduce Approach	es_ES
dc.type	info:eu-repo/semantics/article	es_ES
dc.rights.accessRights	info:eu-repo/semantics/openAccess	es_ES
dc.identifier.doi	10.1155/2015/246139

Ficheros en el ítem

Nombre:: Peralta_BigDataClassification.pdf
Tamaño:: 2.164Mb
Formato:: PDF

Este ítem aparece en la(s) siguiente(s) colección(ones)

DCCIA - Artículos

Mostrar el registro sencillo del ítem

Excepto si se señala otra cosa, la licencia del ítem se describe como Creative Commons Attribution-NonCommercial-NoDerivs 3.0 License