Universidad de Granada Digibug
 

Repositorio Institucional de la Universidad de Granada >
1.-Investigación >
Departamentos, Grupos de Investigación e Institutos >
Departamento de Ciencias de la Computación e Inteligencia Artificial >
DCCIA - Artículos >

Please use this identifier to cite or link to this item: http://hdl.handle.net/10481/39134

Title: Evolutionary Feature Selection for Big Data Classification: A MapReduce Approach
Authors: Peralta, Daniel
Río, Sara del
Ramírez-Gallego, Sergio
Triguero, Isaac
Benítez Sánchez, José Manuel
Herrera, Francisco
Issue Date: 2015
Abstract: Nowadays, many disciplines have to deal with big datasets that additionally involve a high number of features. Feature selection methods aim at eliminating noisy, redundant, or irrelevant features that may deteriorate the classification performance. However, traditional methods lack enough scalability to cope with datasets of millions of instances and extract successful results in a delimited time. This paper presents a feature selection algorithm based on evolutionary computation that uses the MapReduce paradigm to obtain subsets of features from big datasets. The algorithm decomposes the original dataset in blocks of instances to learn from them in the map phase; then, the reduce phase merges the obtained partial results into a final vector of feature weights, which allows a flexible application of the feature selection procedure using a threshold to determine the selected subset of features. The feature selection method is evaluated by using three well-known classifiers (SVM, Logistic Regression, and Naive Bayes) implemented within the Spark framework to address big data problems. In the experiments, datasets up to 67 millions of instances and up to 2000 attributes have been managed, showing that this is a suitable framework to perform evolutionary feature selection, improving both the classification accuracy and its runtime when dealing with big data problems.
Sponsorship: This work is supported by the Research Projects TIN2014-57251-P, P10-TIC-6858, P11-TIC-7765, P12-TIC-2958, and TIN2013-47210-P. D. Peralta and S. Ramírez-Gallego hold two FPU scholarships from the Spanish Ministry of Education and Science (FPU12/04902, FPU13/00047). I. Triguero holds a BOF postdoctoral fellowship from the Ghent University.
Publisher: Hindawi Publishing Corporation
Keywords: Algorithm
Datasets
Classification
Instance selection
URI: http://hdl.handle.net/10481/39134
ISSN: 1024-123X
1563-5147
Rights : Creative Commons Attribution-NonCommercial-NoDerivs 3.0 License
Citation: Peralta, D.; et al. Evolutionary Feature Selection for Big Data Classification: A MapReduce Approach. Mathematical Problems in Engineering, 2015: 246139 (2015). [http://hdl.handle.net/10481/39134]
Appears in Collections:DCCIA - Artículos

Files in This Item:

File Description SizeFormat
Peralta_BigDataClassification.pdf2.22 MBAdobe PDFView/Open
Recommend this item

This item is licensed under a Creative Commons License
Creative Commons

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

 

Valid XHTML 1.0! OpenAire compliant DSpace Software Copyright © 2002-2007 MIT and Hewlett-Packard - Feedback

© Universidad de Granada