A Flexible Big Data System for Credibility-Based Filtering of Social Media Information According to Expertise Díaz García, José Ángel Gutiérrez Batista, Karel Fernández Basso, Carlos Jesús Ruiz Jiménez, María Dolores Martín Bautista, María José Social media mining Pre-processing Big data Nowadays, social networks have taken on an irreplaceable role as sources of information. Millions of people use them daily to find out about the issues of the moment. This success has meant that the amount of content present in social networks is unmanageable and, in many cases, fake or non-credible. Therefore, a correct pre-processing of the data is necessary if we want to obtain knowledge and value from these data sets. In this paper, we propose a new data pre-processing technique based on Big Data that seeks to solve two of the key concepts of the Big Data paradigm, data validity and credibility of the data and volume. The system is a Spark-based filter that allows us to flexibly select credible users related to a given topic under analysis, reducing the volume of data and keeping only valid data for the problem under study. The proposed system uses the power of word embeddings in conjunction with other text mining and natural language processing techniques. The system has been validated using three real-world use cases. 2024-05-16T07:31:02Z 2024-05-16T07:31:02Z 2024-04-15 journal article Diaz-Garcia, J.A., Gutiérrez-Batista, K., Fernandez-Basso, C. et al. A Flexible Big Data System for Credibility-Based Filtering of Social Media Information According to Expertise. Int J Comput Intell Syst 17, 93 (2024). [https://doi.org/10.1007/s44196-024-00483-y] https://hdl.handle.net/10481/91841 10.1007/s44196-024-00483-y eng info:eu-repo/grantAgreement/EU/NextGenerationEU/TED2021-1289402B-C21 http://creativecommons.org/licenses/by/4.0/ open access Atribución 4.0 Internacional Springer Nature