A Multilingual Spam Reviews Detection Based on Pre-Trained Word Embedding and Weighted Swarm Support Vector Machines
Metadatos
Mostrar el registro completo del ítemEditorial
IEEE Xplore
Materia
Security Detection Spam reviews Pre-trained Word embedding Weighted SVM COVID-19 Multilingual
Fecha
2023-07-10Referencia bibliográfica
A. M. Al-Zoubi, A. M. Mora and H. Faris, "A Multilingual Spam Reviews Detection Based on Pre-Trained Word Embedding and Weighted Swarm Support Vector Machines," in IEEE Access, vol. 11, pp. 72250-72271, 2023, [doi: 10.1109/ACCESS.2023.3293641]
Patrocinador
Projects TED2021-129938B-I0,; PID2020-113462RB-I00, PDC2022-133900-I00; PID2020-115570GB-C22, granted by Ministerio Español de Ciencia e Innovación; MCIN/AEI/10.13039/501100011033; MCIN/AEI/10.13039/501100011033; MCIN/AEI; Next GenerationEU/PRTRResumen
Online reviews are important information that customers seek when deciding to buy products or
services. Also, organizations benefit from these reviews as essential feedback for their products or services.
Such information required reliability, especially during the Covid-19 pandemic which showed a massive
increase in online reviews due to quarantine and sitting at home. Not only the number of reviews was boosted
but also the context and preferences during the pandemic. Therefore, spam reviewers reflect on these changes
and improve their deception technique. Spam reviews usually consist of misleading, fake, or fraudulent
reviews that tend to deceive customers for the purpose of making money or causing harm to other competitors.
Hence, this work presents a Weighted Support Vector Machine (WSVM) and Harris Hawks Optimization
(HHO) for spam review detection. The HHO works as an algorithm for optimizing hyperparameters and
feature weighting. Three different language corpora have been used as datasets, namely English, Spanish, and
Arabic in order to solve the multilingual problem in spam reviews. Moreover, pre-trained word embedding
(BERT) has been applied alongside three-word representation methods (NGram-3, TFIDF, and One-hot
encoding). Four experiments have been conducted, each focused on solving and demonstrating different
aspects. In all experiments, the proposed approach showed excellent results compared with other state-ofthe-
art algorithms. In other words, the WSVM-HHO achieved an accuracy of 88.163%, 71.913%, 89.565%,
and 84.270%, for English, Spanish, Arabic, and Multilingual datasets, respectively. Further, a deep analysis
has been conducted to investigate the context of reviews before and after the COVID-19 situation. In addition,
it has been generated to create a new dataset with statistical features and merge its previous textual features
for improving detection performance.





