Spam Reviews Detection Models in Multilingual Contexts applying Sentiment Analysis, Metaheuristics, and Advanced Word Embedding

Al Zoubi, Ala' Mahmoud Mohammed

dc.contributor.advisor	Mora García, Antonio Miguel
dc.contributor.advisor	Faris, Hossam
dc.contributor.author	Al Zoubi, Ala' Mahmoud Mohammed
dc.contributor.other	Universidad de Granada. Programa de Doctorado en Tecnologías de la Información y Comunicación	es_ES
dc.date.accessioned	2024-04-23T08:05:30Z
dc.date.available	2024-04-23T08:05:30Z
dc.date.issued	2024
dc.date.submitted	2024-03-15
dc.identifier.citation	Al Zoubi, Ala' Mahmoud Mohammed. Spam Reviews Detection Models in Multilingual Contexts applying Sentiment Analysis, Metaheuristics, and Advanced Word Embedding. Granada: Universidad de Granada, 2024. [https://hdl.handle.net/10481/91051]	es_ES
dc.identifier.isbn	9788411952767
dc.identifier.uri	https://hdl.handle.net/10481/91051
dc.description.abstract	Online reviews are a type of information that comes from consumers’ experiences after using products or services. Many individuals and organizations consider these reviews as an evaluation source that can be used for their sake. For example, customers with the desire to purchase read reviews in order to know the quality of products and services, while organizations view them as feedback about their products. As a result, reviews can easily impact businesses’ reputations either in a positive or negative way. The COVID-19 pandemic led to a marked increase in the number of online reviews, with a 50% uptick observed between 2020 and 2023. This can be attributed to the widespread adoption of remote technology and increased time spent at home, as compared to pre-pandemic levels. In addition to the increase in numbers, the COVID-19 pandemic also led to changes in the style, structure, context, and preferences of online reviews. However, not all reviews are genuine, some of them are written for the purpose of misleading consumers to make a profit or damage competitors. Such reviews are known as spam reviews, and they can lead to a lot of harm, including financial losses and job loss. Therefore, it is a necessity to identify these fraudulent reviews. Spam reviews detection is an important aspect of online security as it aims to mitigate the potential negative effects of such reviews, including manipulation of online reputation and consumer fraud. Additionally, it helps to protect the integrity of online marketplaces and prevents the promotion of malicious sites through spam reviews. One of the well-recognized methods used to detect spam reviews is machine learning, specifically through the application of Natural Language Processing (NLP). Machine learning algorithms, incorporating NLP techniques, prove to be efficient methods for text classification across various tasks, such as sentiment analysis and spam reviews detection. In this thesis, three different approaches are presented in order to identify spam and the sentiment of reviews. The first applied approach is the Weighted Support Vector Machine (WSVM) and a swarm intelligence algorithm supported by the prey and predator behaviors of soft and hard besiege for spam review detection, where the optimization algorithm was incorporated for feature weighting and hyperparameter optimization simultaneously. The combination of SVM and metaheuristic algorithms showed excellent performance in the literature for solving different problems. To enhance their performance, a weighted method has been implemented in conjunction with a metaheuristic algorithm. The first approach is applied to test the performance of the proposed WSVM in spam reviews detection and compare it with other existing approaches. Furthermore, the second approach employed Twin SVM (TwinSVM) and a swarm intelligence algorithm based on the special chain movement method to address a different problem, specifically in analyzing the sentiment of reviews. Due to its superior performance compared to traditional SVM, particularly in mitigating overfitting, improving generalization, handling imbalanced datasets, enhancing robustness, and adapting to complex structures, the TwinSVM model has been employed. Moreover, the swarm-based algorithm has been utilized in this approach for the parameter optimization of TwinSVM. TwinSVM is a newer version of the standard SVM, which offers several advantages compared to SVM and incorporates additional parameters. Further, we used the second approach for sentiment prediction of the same reviews and then compared it with the previous approach (WSVM) and other approaches. In the final approach, we amalgamated the findings from the previous two approaches to pursue the optimal outcome. That is to say, sentiment analysis was applied to improve spam review detection based on the combination of the prey and predator behaviors of the swarm-based algorithm and TwinSVM. Here, the metaheuristic algorithm is applied to optimize TwinSVM parameters and feature selection. In other words, in this third approach, we combine the previous two applications (spam reviews detection and sentiment analysis) with the optimized version of SVM (TwinSVM) and feature selection technique to enhance the spam detection performance. Given that there are reviews in several languages, three different languages have been taken into account when extracting reviews, namely English, Spanish, and Arabic. In order to convert text into data that can be interpreted by machine learning algorithms, word representation methods have been applied. Due to their ability to deal with multilingual text, three pre-trained word embedding models were applied, namely, Bidirectional Encoder Representations from Transformers (BERT), FastText, and Multilingual Unsupervised and Supervised Embeddings (MUSE). Consequently, twenty-four datasets were generated, each comprising three languages, and a multilingual dataset was created by combining these three languages with two different dimensions (rows) (100 & 400). The experimental results of all approaches demonstrate the superiority of the proposed methods compared with other algorithms from the State of the Art. Specifically, the prey and predator-based algorithm with WSVM obtained the best results (Accuracy) in the first approach, with 88%, 71%, 89%, and 84%, for English, Spanish, Arabic, and Multilingual datasets, respectively. As for the second approach, the special chain movement algorithm combined with TwinSVM outperformed the other algorithms on most datasets. More precisely, it has the best results in three out of the main four datasets with 87%, 86%, and 87% for Arabic, English, and Multilingual datasets, respectively. In the final approach, the prey and predator algorithm with TwinSVM-Fs achieved the highest accuracy when compared with other algorithms, with 92%, 89%, 80%, and 85% for Arabic, English, Spanish, and Multilingual datasets, respectively. Further, with the help of sentiment analysis, the results improved by 1.0994%, 2.6674%, 9.4430%, and 8.7448% for Arabic BERT-100, English MUSE-400, Spanish Fast-400, and Multi MUSE-400, respectively. At the end of each approach, a comprehensive analysis of the review’s style, context, and preferences was discussed. Overall, the thesis proposes three effective approaches to detect spam reviews, study the sentiment of the reviews, and ultimately, improve the detection of spam reviews using sentiment analysis, all within a multilingual environment.	es_ES
dc.description.sponsorship	Tesis Univ. Granada.	es_ES
dc.format.mimetype	application/pdf	en_US
dc.language.iso	eng	es_ES
dc.publisher	Universidad de Granada	es_ES
dc.rights	Attribution-NonCommercial-NoDerivatives 4.0 Internacional	*
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/4.0/	*
dc.title	Spam Reviews Detection Models in Multilingual Contexts applying Sentiment Analysis, Metaheuristics, and Advanced Word Embedding	es_ES
dc.type	doctoral thesis	es_ES
europeana.type	TEXT	en_US
europeana.dataProvider	Universidad de Granada. España.	es_ES
europeana.rights	http://creativecommons.org/licenses/by-nc-nd/3.0/	en_US
dc.rights.accessRights	open access	es_ES
dc.type.hasVersion	VoR	es_ES

Ficheros en el ítem

Nombre:: 95015.pdf
Tamaño:: 2.911Mb
Formato:: PDF

Este ítem aparece en la(s) siguiente(s) colección(ones)

Tesis
Tesis leídas en la Universidad de Granada

Mostrar el registro sencillo del ítem

Excepto si se señala otra cosa, la licencia del ítem se describe como Attribution-NonCommercial-NoDerivatives 4.0 Internacional