<rdf:RDF xmlns:rdf="http://www.openarchives.org/OAI/2.0/rdf/" xmlns:ow="http://www.ontoweb.org/ontology/1#" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:ds="http://dspace.org/ds/elements/1.1/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:doc="http://www.lyncode.com/xoai" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/rdf/ http://www.openarchives.org/OAI/2.0/rdf.xsd">
   <ow:Publication rdf:about="oai:digibug.ugr.es:10481/107287">
      <dc:title>From data to detection: Developing a corpus and training language models for the identification of anti-refugee narratives in Spanish</dc:title>
      <dc:creator>Mata, Jacinto</dc:creator>
      <dc:creator>Gualda, Estrella</dc:creator>
      <dc:creator>Pachón, Victoria</dc:creator>
      <dc:creator>Rebollo, Carolina</dc:creator>
      <dc:creator>Domínguez, Juan L.</dc:creator>
      <dc:subject>Deep learning</dc:subject>
      <dc:subject>Language models</dc:subject>
      <dc:subject>Transformers</dc:subject>
      <dc:description>This study addresses the automatic detection of negative anti-refugee messages in Spanish texts, using language&#xd;
models based on pre-trained Transformers models. Despite numerous studies on hate speech detection, few&#xd;
have concentrated on Spanish, particularly regarding hostility towards refugees. To fill this void, we developed&#xd;
HateRADAR-es, a new corpus of Spanish-language tweets manually annotated by sociologist and social workers&#xd;
experts to identify the presence or absence of hateful content directed at refugees. This dataset has been&#xd;
made available to the research community to encourage further investigation. A comprehensive experimental&#xd;
framework to tackle this challenge, composed of several stages to achieve language models with a high&#xd;
efficacy in detecting such messages, is presented. To address the class imbalance issue in the data, data&#xd;
augmentation techniques are applied, and extensive experimentation is carried out to find the best values&#xd;
for the hyperparameters of the language models to achieve better performance. In the evaluation process, an&#xd;
ensemble of the fine-tuned models BETO, XLM-RoBERTa, and RoBERTa-large achieved the best results, with&#xd;
an accuracy of 0.891, an F1-measure of 0.860, and an AUC-ROC of 0.892. These findings underscore the&#xd;
effectiveness of combining multiple models into an ensemble to handle the complexity and nuances of hate&#xd;
speech on social media, offering a promising direction for future adaptations and applications of language&#xd;
models in specific hate contexts.</dc:description>
      <dc:date>2025-10-22T08:39:35Z</dc:date>
      <dc:date>2025-10-22T08:39:35Z</dc:date>
      <dc:date>2025-12</dc:date>
      <dc:type>journal article</dc:type>
      <dc:identifier>Mata, J., Gualda, E., Pachón, V., Rebollo, C., &amp; Domínguez, J. L. (2025). From data to detection: Developing a corpus and training language models for the identification of anti-refugee narratives in Spanish. Array (New York, N.Y.), 28(100526), 100526. https://doi.org/10.1016/j.array.2025.100526</dc:identifier>
      <dc:identifier>https://hdl.handle.net/10481/107287</dc:identifier>
      <dc:identifier>10.1016/j.array.2025.100526</dc:identifier>
      <dc:language>eng</dc:language>
      <dc:relation>info:eu-repo/grantAgreement/EU/PRTR/JDC2022-048239-I</dc:relation>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:rights>open access</dc:rights>
      <dc:rights>Attribution-NonCommercial-NoDerivatives 4.0 Internacional</dc:rights>
      <dc:publisher>Elsevier</dc:publisher>
   </ow:Publication>
</rdf:RDF>