Multidimensional Affective Analysis for Low-resource Languages: A Use Case with Guarani-Spanish Code-switching Language
Metadatos
Mostrar el registro completo del ítemMateria
Natural language processing Sentiment analysis Affective analysis
Fecha
2023Patrocinador
2020 Leonardo Grant for Researchers and Cultural Creators from the FBBVA; Grant SCANNER-UDC (PID2020-113230RB-C21) funded by MCIN/AEI/10.13039/501100011033; European Research Council (ERC), which has supported this research under the European Union’s Horizon Europe research and innovation programme (SALSA, grant agreement No 101100615); Xunta de Galicia (ED431C 2020/11), and Centro de Investigación de Galicia “CITIC”, funded by Xunta de Galicia and the European Union (ERDF - Galicia 2014-2020 Program) Grant ED431G 2019/01; University of Granada, Generalitat Valenciana and the University of Alicante (IDIFEDER/2020/003)Resumen
This paper focuses on text-based affective computing for Jopara, a code-switching language that combines Guarani and Spanish. First, we collected a dataset of tweets primarily written in Guarani and annotated them for three widely used dimensions in sentiment analysis: (a) emotion recognition, (b) humor detection, and (c) offensive language identification. Then, we developed several neural network models, including large language models specifically designed for Guarani, and compared their performance against off-the-shelf multilingual and Spanish pre-trained models for the aforementioned dimensions. Our experiments show that language models incorporating Guarani during pre-training or pre-fine-tuning consistently achieve the best results, despite limited resources (a single 24-GB GPU and only 800K tokens). Notably, even a Guarani BERT model with just two layers of Transformers shows a favorable balance between accuracy and computational power, likely due to the inherent low-resource nature of the task. We present a comprehensive overview of corpus creation and model development for low-resource languages like Guarani, particularly in the context of its code-switching with Spanish, resulting in Jopara. Our findings shed light on the challenges and strategies involved in analyzing affective language in such linguistic contexts.