Universidad de Granada Digibug
 

Repositorio Institucional de la Universidad de Granada >
1.-Investigación >
Departamentos, Grupos de Investigación e Institutos >
Departamento de Ciencias de la Computación e Inteligencia Artificial >
DCCIA - Artículos >

Please use this identifier to cite or link to this item: http://hdl.handle.net/10481/49266

Title: An insight into imbalanced Big Data classification: outcomes and challenges
Authors: Fernández Hilario, Alberto
Río, Sara del
Chawla, Nitesh V.
Herrera, Francisco
Issue Date: 2017
Abstract: Big Data applications are emerging during the last years, and researchers from many disciplines are aware of the high advantages related to the knowledge extraction from this type of problem. However, traditional learning approaches cannot be directly applied due to scalability issues. To overcome this issue, the MapReduce framework has arisen as a “de facto” solution. Basically, it carries out a “divide-and-conquer” distributed procedure in a fault-tolerant way to adapt for commodity hardware. Being still a recent discipline, few research has been conducted on imbalanced classification for Big Data. The reasons behind this are mainly the difficulties in adapting standard techniques to the MapReduce programming style. Additionally, inner problems of imbalanced data, namely lack of data and small disjuncts, are accentuated during the data partitioning to fit the MapReduce programming style. This paper is designed under three main pillars. First, to present the first outcomes for imbalanced classification in Big Data problems, introducing the current research state of this area. Second, to analyze the behavior of standard pre-processing techniques in this particular framework. Finally, taking into account the experimental results obtained throughout this work, we will carry out a discussion on the challenges and future directions for the topic.
Sponsorship: This work has been partially supported by the Spanish Ministry of Science and Technology under Projects TIN2014-57251-P and TIN2015-68454-R, the Andalusian Research Plan P11-TIC-7765, the Foundation BBVA Project 75/2016 BigDaPTOOLS, and the National Science Foundation (NSF) Grant IIS-1447795.
Publisher: Springer
Keywords: Big data
Imbalanced classification
MapReduce
Pre-processing
Sampling
URI: http://hdl.handle.net/10481/49266
ISSN: 2198-6053
Rights : Creative Commons Attribution-NonCommercial-NoDerivs 3.0 License
Citation: Fernández Hilario, A.; et al. An insight into imbalanced Big Data classification: outcomes and challenges. Complex and Intelligent Systems, 3(2): 105-120 (2017). [http://hdl.handle.net/10481/49266]
Appears in Collections:DCCIA - Artículos

Files in This Item:

File Description SizeFormat
FernandezHilario_BigData.pdf726.51 kBAdobe PDFView/Open
Recommend this item

This item is licensed under a Creative Commons License
Creative Commons

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

 

Valid XHTML 1.0! OpenAire compliant DSpace Software Copyright © 2002-2007 MIT and Hewlett-Packard - Feedback

© Universidad de Granada