Offensive Language Detection in Arabic Social Networks Using Evolutionary-Based Classifiers Learned From Fine-Tuned Embeddings

Shannaq, Fatima; Castillo Valdivieso, Pedro Ángel

doi:10.1109/ACCESS.2022.3190960

dc.contributor.author	Shannaq, Fatima
dc.contributor.author	Castillo Valdivieso, Pedro Ángel
dc.date.accessioned	2022-09-07T09:30:55Z
dc.date.available	2022-09-07T09:30:55Z
dc.date.issued	2022-07-14
dc.identifier.citation	F. Shannaq... [et al.]. "Offensive Language Detection in Arabic Social Networks Using Evolutionary-Based Classifiers Learned From Fine-Tuned Embeddings," in IEEE Access, vol. 10, pp. 75018-75039, 2022, doi: [10.1109/ACCESS.2022.3190960]	es_ES
dc.identifier.uri	http://hdl.handle.net/10481/76565
dc.description.abstract	Social networks facilitate communication between people from all over the world. Unfortunately, the excessive use of social networks leads to the rise of antisocial behaviors such as the spread of online offensive language, cyberbullying (CB), and hate speech (HS). Therefore, abusive\offensive and hate detection become a crucial part of cyberharassment. Manual detection of cyberharassment is cumbersome, slow, and not even feasible in rapidly growing data. In this study, we addressed the challenges of automatic detection of the offensive tweets in the Arabic language. The main contribution of this study is to design and implement an intelligent prediction system encompassing a two-stage optimization approach to identify and classify the offensive from the non-offensive text. In the rst stage, the proposed approach ne-tuned the pre-trainedword embedding models by training them for several epochs on the training dataset. The embeddings of the vocabularies in the new dataset are trained and added to the old embeddings. While in the second stage, it employed a hybrid approach of two classi ers, namely XGBoost and SVM, and a genetic algorithm (GA) to mitigate the drawback of the classi ers in nding the optimal hyperparameter values to run the proposed approach. We tested the proposed approach on Arabic Cyberbullying Corpus (ArCybC), which contains tweets collected from four Twitter domains: gaming, sports, news, and celebrities. The ArCybC dataset has four categories: sexual, racial, intelligence, and appearance. The proposed approach produced superior results, in which the SVM algorithm with the Aravec SkipGram word embedding model achieved an accuracy rate of 88.2% and an F1-score rate of 87.8%.	es_ES
dc.description.sponsorship	Ministerio Espanol de Ciencia e Innovacion (DemocratAI::UGR) PID2020-115570GB-C22	es_ES
dc.language.iso	eng	es_ES
dc.publisher	IEEE	es_ES
dc.rights	Attribution-NonCommercial-NoDerivatives 4.0 Internacional	*
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/4.0/	*
dc.subject	Arabic harassment dataset	es_ES
dc.subject	Deep learning	es_ES
dc.subject	Evolutionary algorithm	es_ES
dc.subject	Fine-tuned word embedding	es_ES
dc.subject	Hate speech	es_ES
dc.subject	Offensive language	es_ES
dc.subject	Optimization	es_ES
dc.title	Offensive Language Detection in Arabic Social Networks Using Evolutionary-Based Classifiers Learned From Fine-Tuned Embeddings	es_ES
dc.type	journal article	es_ES
dc.rights.accessRights	open access	es_ES
dc.identifier.doi	10.1109/ACCESS.2022.3190960
dc.type.hasVersion	VoR	es_ES

Ficheros en el ítem

Nombre:: Offensive_Language_Detection.pdf
Tamaño:: 2.364Mb
Formato:: PDF

Este ítem aparece en la(s) siguiente(s) colección(ones)

DICAR - Artículos

Mostrar el registro sencillo del ítem

Excepto si se señala otra cosa, la licencia del ítem se describe como Attribution-NonCommercial-NoDerivatives 4.0 Internacional