Mostrar el registro sencillo del ítem

dc.contributor.authorKharitonova, Ksenia
dc.contributor.authorPérez Fernández, David
dc.contributor.authorGutiérrez Hernando, Javier
dc.contributor.authorGutiérrez Fandiño, Asier
dc.contributor.authorCallejas Carrión, Zoraida 
dc.contributor.authorGriol Barres, David 
dc.date.accessioned2025-09-10T11:18:32Z
dc.date.available2025-09-10T11:18:32Z
dc.date.issued2025-07-28
dc.identifier.citationKharitonova, K.; PérezFernández, D.; Gutiérrez-Hernando, J.; Gutiérrez-Fandiño, A.; Callejas, Z.; Griol, D. EsCorpiusBias: The Contextual Annotation and Transformer-Based Detection of Racism and Sexism in Spanish Dialogue. Future Internet 2025, 17, 340. https://doi.org/10.3390/fi17080340es_ES
dc.identifier.urihttps://hdl.handle.net/10481/106228
dc.description.abstractThe rise in online communication platforms has significantly increased exposure to harmful discourse, presenting ongoing challenges for digital moderation and user well-being. This paper introduces the EsCorpiusBias corpus, designed to enhance the automated detection of sexism and racism within Spanish-language online dialogue, specifically sourced from the Mediavida forum. By means of a systematic, context-sensitive annotation protocol, approximately 1000 three-turn dialogue units per bias category are annotated, ensuring the nuanced recognition of pragmatic and conversational subtleties. Here, annotation guidelines are meticulously developed, covering explicit and implicit manifestations of sexism and racism. Annotations are performed using the Prodigy tool (v1. 16.0) resulting in moderate to substantial inter-annotator agreement (Cohen’s Kappa: 0.55 for sexism and 0.79 for racism). Models including logistic regression, SpaCy’s baseline n-gram bagof-words model, and transformer-based BETO are trained and evaluated, demonstrating that contextualized transformer-based approaches significantly outperform baseline and general-purpose models. Notably, the single-turn BETO model achieves an ROC-AUC of 0.94 for racism detection, while the contextual BETO model reaches an ROC-AUC of 0.87 for sexism detection, highlighting BETO’s superior effectiveness in capturing nuanced bias in online dialogues. Additionally, lexical overlap analyses indicate a strong reliance on explicit lexical indicators, highlighting limitations in handling implicit biases. This research underscores the importance of contextually grounded, domain-specific fine-tuning for effective automated detection of toxicity, providing robust resources and methodologies to foster socially responsible NLP systems within Spanish-speaking online communities.es_ES
dc.description.sponsorshipMCIN/AEI/10.13039/501100011033 (TED2021-132470B-I00)es_ES
dc.language.isoenges_ES
dc.publisherMDPIes_ES
dc.rightsAtribución 4.0 Internacional*
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/*
dc.subjecthate speech detectiones_ES
dc.subjectbiases_ES
dc.subjectnatural language processinges_ES
dc.subjectcorpus annotationes_ES
dc.subjectsexism and racism detectiones_ES
dc.subjectmachine learning for toxicityes_ES
dc.titleEsCorpiusBias: The Contextual Annotation and Transformer-Based Detection of Racism and Sexism in Spanish Dialoguees_ES
dc.typejournal articlees_ES
dc.rights.accessRightsopen accesses_ES
dc.identifier.doi10.3390/fi17080340
dc.type.hasVersionVoRes_ES


Ficheros en el ítem

[PDF]

Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem

Atribución 4.0 Internacional
Excepto si se señala otra cosa, la licencia del ítem se describe como Atribución 4.0 Internacional