A Flexible Big Data System for Credibility-Based Filtering of Social Media Information According to Expertise
Metadatos
Mostrar el registro completo del ítemAutor
Díaz García, José Ángel; Gutiérrez Batista, Karel; Fernández Basso, Carlos Jesús; Ruiz Jiménez, María Dolores; Martín Bautista, María JoséEditorial
Springer Nature
Materia
Social media mining Pre-processing Big data
Fecha
2024-04-15Referencia bibliográfica
Diaz-Garcia, J.A., Gutiérrez-Batista, K., Fernandez-Basso, C. et al. A Flexible Big Data System for Credibility-Based Filtering of Social Media Information According to Expertise. Int J Comput Intell Syst 17, 93 (2024). [https://doi.org/10.1007/s44196-024-00483-y]
Patrocinador
FederaMed project: Grant PID2021-123960OB-I00 funded by MICIU/AEI/10.13039/501100011033; ERDF/EU; BIGDATAMED projects with references B-TIC-145-UGR18 and P18-RT-2947; European Union NextGenerationEU /PRTR, grant PLEC2021-007681 funded by MCIN/AEI/10.13039/501100011033; DESINFOSCAN project. Ministerio de Ciencia e Innovacion; European Union NextGenerationEU (Grant TED2021-1289402B-C21); NOFACEPS project (PPJIB2021-04) of the University of Granada’s; Ministry of Universities through the EU-fundedMargarita Salas Programme; Spanish Ministry of Education, Culture and Sport (FPU18/00150); Administration of the Junta de AndalucíaResumen
Nowadays, social networks have taken on an irreplaceable role as sources of information. Millions of people use them daily
to find out about the issues of the moment. This success has meant that the amount of content present in social networks is
unmanageable and, in many cases, fake or non-credible. Therefore, a correct pre-processing of the data is necessary if we
want to obtain knowledge and value from these data sets. In this paper, we propose a new data pre-processing technique based
on Big Data that seeks to solve two of the key concepts of the Big Data paradigm, data validity and credibility of the data
and volume. The system is a Spark-based filter that allows us to flexibly select credible users related to a given topic under
analysis, reducing the volume of data and keeping only valid data for the problem under study. The proposed system uses the
power of word embeddings in conjunction with other text mining and natural language processing techniques. The system
has been validated using three real-world use cases.