From big to smart data: Iterative ensemble filter for noise filtering in big data classification
Metadatos
Mostrar el registro completo del ítemAutor
García Gil, Diego Jesús; Luque‐Sánchez, Francisco; Luengo Martín, Julián; García López, Salvador; Herrera Triguero, FranciscoEditorial
International Journal of Intelligent Systems
Materia
Big Data class noise classification, Smart Data ensemble
Fecha
2019-10-09Referencia bibliográfica
García‐Gil, D., Luque‐Sánchez, F., Luengo, J., García, S., & Herrera, F. (2019). From big to smart data: Iterative ensemble filter for noise filtering in big data classification. International Journal of Intelligent Systems, 34(12), 3260-3274.
Patrocinador
Spanish national Research Project, Grant/Award Number: TIN2017‐89517‐PResumen
The quality of the data is directly related to the quality of the models drawn from that data. For that reason, many research is devoted to improve the quality of the data and to amend errors that it may contain. One of the most common problems is the presence of noise in classification tasks, where noise refers to the incorrect labeling of training instances. This problem is very disruptive, as it changes the decision boundaries of the problem. Big Data problems pose a new challenge in terms of quality data due to the massive and unsupervised accumulation of data. This Big Data scenario also brings new problems to classic data preprocessing algorithms, as they are not prepared for working with such amounts of data, and these algorithms are key to move from Big to Smart Data. In this paper, an iterative ensemble filter for removing noisy instances in Big Data scenarios is proposed. Experiments carried out in six Big Data datasets have shown that our noise filter outperforms the current state-of-the-art noise filter in Big Data domains. It has also proved to be an effective solution for transforming raw Big Data into Smart Data.