The blueprint of a new fact-checking system: A methodology to enrich RAG systems with new generated datasets
Identificadores
URI: https://hdl.handle.net/10481/107557Metadatos
Mostrar el registro completo del ítemAutor
Díaz García, José Ángel; López‑Joya, Salvador; Martín Bautista, María José; Ruiz Jiménez, María DoloresEditorial
Pergamon
Materia
fact checking RAG NLP lenguaje models datasets 
Fecha
2025-10-09Referencia bibliográfica
Lopez-Joya, S., Diaz-Garcia, J. A., Ruiz, M. D., & Martin-Bautista, M. J. (2025). The blueprint of a new fact-checking system: A methodology to enrich RAG systems with new generated datasets. Computers and Electrical Engineering, 128, 110746.
Patrocinador
The research reported in this paper was supported by the DesinfoScan project: Grant TED2021-129402B-C21 funded by MICIU/AEI/10.13039/501100011033 and by the European Union NextGenerationEU/PRTR, and FederaMed project: Grant PID2021-123960OB-I00 funded by MICIU/AEI/10.13039/501100011033 and by ERDF/EU. Finally, the research reported in this paper is also funded by the European Union (BAG-INTEL project, grant agreement no. 101121309).Resumen
In an era where digital misinformation spreads rapidly, Artificial Intelligence (AI) has become a crucial tool for fact-checking. However, the effectiveness of AI in this domain is often limited by the availability of high-quality and scalable datasets to train and guide algorithms. In this paper, we introduce VERIFAID (VERIfication FAISS-based framework for fake news Detection), a novel framework that improves fact-checking through a Retrieval-Augmented Generation (RAG) system based on automatically generated and dynamically growing datasets. Our approach improves evidence retrieval by building a scalable knowledge base, reducing the reliance on manually annotated data. The system consists of three key modules: two dedicated to dataset creation and one inference module that integrates advanced language models, such as LLaMA, within the RAG paradigm. To validate our methodology, we provide technical specifications for both the system and the dataset, together with comprehensive evaluations in zero-shot fact-checking scenarios. The results demonstrate the efficiency and adaptability of our approach and its potential to improve AI-driven fact verification at scale.





