A hybrid TwinSVM-HHO model for multilingual spam review detection using sentiment features and pre-trained embeddings
Identificadores
URI: https://hdl.handle.net/10481/110424Metadata
Show full item recordEditorial
Elsevier
Materia
Multilingual analysis SPAM detection SPAM Review Sentiment Analysis Support Vector Machines SVM Harris Hawk Optimization HHO Embedding
Date
2025-08-25Referencia bibliográfica
Ala’ M. Al-Zoubi, Antonio M. Mora, Hossam Faris, Raneem Qaddoura, A hybrid TwinSVM-HHO model for multilingual spam review detection using sentiment features and pre-trained embeddings, Expert Systems with Applications, Volume 287, 2025, 128160, ISSN 0957-4174, https://doi.org/10.1016/j.eswa.2025.128160. (https://www.sciencedirect.com/science/article/pii/S0957417425017804)
Abstract
The detection of spam reviews in multilingual environments remains a challenging task due to linguistic diversity, data imbalance, and semantic complexity. This paper proposes a novel hybrid model that integrates Twin Support Vector Machine (TwinSVM) with Harris Hawks Optimization (HHO) for simultaneous parameter optimization and feature selection. To enhance semantic understanding, sentiment-based features are incorporated alongside pre-trained word embedding models—BERT, FastText, and MUSE—across English, Arabic, and Spanish datasets. Our approach generates 24 high-quality datasets using embeddings with 100 and 400 dimensions, including a combined multilingual set. Experimental results demonstrate that our proposed HHO-TwinSVM model consistently outperforms conventional classifiers and metaheuristic-enhanced SVMs, achieving accuracy improvements of up to 9.44 % and enhanced robustness in low-resource languages. This integrated framework represents a scalable and adaptable solution for multilingual spam detection. Four detailed experiments were conducted in this study, each designed to address and demonstrate a specific aspect of the proposed approach. Across all experiments, the method outperformed existing algorithms, achieving impressive accuracy rates of 92.9741 %, 89.0314 %, 80.3580 %, and 85.0859 % on Arabic, English, Spanish, and multilingual datasets, respectively. Subsequently, sentiment analysis features were incorporated to further enhance detection performance, resulting in improvements of 1.0994 %, 2.6674 %, 9.4430 %, and 8.7448 %, respectively. A comprehensive analysis of the experimental results, including the influence of reviews and sentiment features, is also presented.





