Mostrar el registro sencillo del ítem

dc.contributor.authorMata, Jacinto
dc.contributor.authorGualda, Estrella
dc.contributor.authorPachón, Victoria
dc.contributor.authorRebollo, Carolina
dc.contributor.authorDomínguez, Juan L.
dc.date.accessioned2025-10-22T08:39:35Z
dc.date.available2025-10-22T08:39:35Z
dc.date.issued2025-12
dc.identifier.citationMata, J., Gualda, E., Pachón, V., Rebollo, C., & Domínguez, J. L. (2025). From data to detection: Developing a corpus and training language models for the identification of anti-refugee narratives in Spanish. Array (New York, N.Y.), 28(100526), 100526. https://doi.org/10.1016/j.array.2025.100526es_ES
dc.identifier.urihttps://hdl.handle.net/10481/107287
dc.description.abstractThis study addresses the automatic detection of negative anti-refugee messages in Spanish texts, using language models based on pre-trained Transformers models. Despite numerous studies on hate speech detection, few have concentrated on Spanish, particularly regarding hostility towards refugees. To fill this void, we developed HateRADAR-es, a new corpus of Spanish-language tweets manually annotated by sociologist and social workers experts to identify the presence or absence of hateful content directed at refugees. This dataset has been made available to the research community to encourage further investigation. A comprehensive experimental framework to tackle this challenge, composed of several stages to achieve language models with a high efficacy in detecting such messages, is presented. To address the class imbalance issue in the data, data augmentation techniques are applied, and extensive experimentation is carried out to find the best values for the hyperparameters of the language models to achieve better performance. In the evaluation process, an ensemble of the fine-tuned models BETO, XLM-RoBERTa, and RoBERTa-large achieved the best results, with an accuracy of 0.891, an F1-measure of 0.860, and an AUC-ROC of 0.892. These findings underscore the effectiveness of combining multiple models into an ensemble to handle the complexity and nuances of hate speech on social media, offering a promising direction for future adaptations and applications of language models in specific hate contexts.es_ES
dc.description.sponsorshipMCIN/AEI/10.13039/501100011033 - FEDER/EU (PID2021-123983OB-I00, NON CONSPIRA HATE!)es_ES
dc.description.sponsorshipMCIN/AEI/10.13039/501100011033 , European Union – NextGenerationEU/PRTR (JDC2022-048239-I)es_ES
dc.language.isoenges_ES
dc.publisherElsevieres_ES
dc.rightsAttribution-NonCommercial-NoDerivatives 4.0 Internacional*
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/4.0/*
dc.subjectDeep learninges_ES
dc.subjectLanguage modelses_ES
dc.subjectTransformerses_ES
dc.titleFrom data to detection: Developing a corpus and training language models for the identification of anti-refugee narratives in Spanishes_ES
dc.typejournal articlees_ES
dc.relation.projectIDinfo:eu-repo/grantAgreement/EU/PRTR/JDC2022-048239-Ies_ES
dc.rights.accessRightsopen accesses_ES
dc.identifier.doi10.1016/j.array.2025.100526
dc.type.hasVersionVoRes_ES


Ficheros en el ítem

[PDF]

Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem

Attribution-NonCommercial-NoDerivatives 4.0 Internacional
Excepto si se señala otra cosa, la licencia del ítem se describe como Attribution-NonCommercial-NoDerivatives 4.0 Internacional