NOFACE: A new framework for irrelevant content filtering in social media according to credibility and expertise

Díaz García, José Ángel; Ruiz Jiménez, María Dolores; Martín Bautista, María José

doi:10.1016/j.eswa.2022.118063

1-s2.0-S0957417422012684-main.pdf (3.157Mb)

Identificadores

URI: http://hdl.handle.net/10481/76708

DOI: 10.1016/j.eswa.2022.118063

Exportar

Editorial

Elsevier

Materia

Social media mining

Pre-processing

Credibility

World embeddings

Fecha

2022-07-13

Referencia bibliográfica

J. Angel Diaz-Garcia, M. Dolores Ruiz, Maria J. Martin-Bautista, NOFACE: A new framework for irrelevant content filtering in social media according to credibility and expertise, Expert Systems with Applications, Volume 208, 2022, 118063, ISSN 0957-4174, [https://doi.org/10.1016/j.eswa.2022.118063]

Patrocinador

European Commission 786687; Andalusian government FEDER operative program P18-RT-2947 B-TIC-145-UGR18; University of Granada's internal plan PPJIB2021-04; Spanish Government FPU18/00150

Resumen

Social networks have taken an irreplaceable role in our lives. They are used daily by millions of people to communicate and inform themselves. This success has also led to a lot of irrelevant content and even misinformation on social media. In this paper, we propose a user-centred framework to reduce the amount of irrelevant content in social networks to support further stages of data mining processes. The system also helps in the reduction of misinformation in social networks, since it selects credible and reputable users. The system is based on the belief that if a user is credible then their content will be credible. Our proposal uses word embeddings in a first stage, to create a set of interesting users according to their expertise. After that, in a later stage, it employs social network metrics to further narrow down the relevant users according to their credibility in the network. To validate the framework, it has been tested with two real Big Data problems on Twitter. One related to COVID-19 tweets and the other to last United States elections on 3rd November. Both are problems in which finding relevant content may be difficult due to the large amount of data published during the last years. The proposed framework, called NOFACE, reduces the number of irrelevant users posting about the topic, taking only those that have a higher credibility, and thus giving interesting information about the selected topic. This entails a reduction of irrelevant information, mitigating therefore the presence of misinformation on a posterior data mining method application, improving the obtained results, as it is illustrated in the mentioned two topics using clustering, association rules and LDA techniques.

Colecciones

OpenAIRE (Open Access Infrastructure for Research in Europe)

Excepto si se señala otra cosa, la licencia del ítem se describe como Attribution-NonCommercial-NoDerivatives 4.0 Internacional