dc.description.abstract | Online reviews are a type of information that comes from consumers’ experiences after
using products or services. Many individuals and organizations consider these reviews
as an evaluation source that can be used for their sake. For example, customers with the
desire to purchase read reviews in order to know the quality of products and services, while
organizations view them as feedback about their products. As a result, reviews can easily
impact businesses’ reputations either in a positive or negative way. The COVID-19 pandemic
led to a marked increase in the number of online reviews, with a 50% uptick observed between
2020 and 2023. This can be attributed to the widespread adoption of remote technology and
increased time spent at home, as compared to pre-pandemic levels. In addition to the increase
in numbers, the COVID-19 pandemic also led to changes in the style, structure, context,
and preferences of online reviews. However, not all reviews are genuine, some of them are
written for the purpose of misleading consumers to make a profit or damage competitors.
Such reviews are known as spam reviews, and they can lead to a lot of harm, including
financial losses and job loss. Therefore, it is a necessity to identify these fraudulent reviews.
Spam reviews detection is an important aspect of online security as it aims to mitigate the
potential negative effects of such reviews, including manipulation of online reputation and
consumer fraud. Additionally, it helps to protect the integrity of online marketplaces and
prevents the promotion of malicious sites through spam reviews. One of the well-recognized
methods used to detect spam reviews is machine learning, specifically through the application
of Natural Language Processing (NLP). Machine learning algorithms, incorporating NLP
techniques, prove to be efficient methods for text classification across various tasks, such as
sentiment analysis and spam reviews detection. In this thesis, three different approaches are
presented in order to identify spam and the sentiment of reviews.
The first applied approach is the Weighted Support Vector Machine (WSVM) and a
swarm intelligence algorithm supported by the prey and predator behaviors of soft and hard
besiege for spam review detection, where the optimization algorithm was incorporated for
feature weighting and hyperparameter optimization simultaneously. The combination of
SVM and metaheuristic algorithms showed excellent performance in the literature for solving
different problems. To enhance their performance, a weighted method has been implemented in conjunction with a metaheuristic algorithm. The first approach is applied to test the
performance of the proposed WSVM in spam reviews detection and compare it with other
existing approaches.
Furthermore, the second approach employed Twin SVM (TwinSVM) and a swarm
intelligence algorithm based on the special chain movement method to address a different
problem, specifically in analyzing the sentiment of reviews. Due to its superior performance
compared to traditional SVM, particularly in mitigating overfitting, improving generalization,
handling imbalanced datasets, enhancing robustness, and adapting to complex structures, the
TwinSVM model has been employed. Moreover, the swarm-based algorithm has been utilized
in this approach for the parameter optimization of TwinSVM. TwinSVM is a newer version
of the standard SVM, which offers several advantages compared to SVM and incorporates
additional parameters. Further, we used the second approach for sentiment prediction of
the same reviews and then compared it with the previous approach (WSVM) and other
approaches.
In the final approach, we amalgamated the findings from the previous two approaches
to pursue the optimal outcome. That is to say, sentiment analysis was applied to improve
spam review detection based on the combination of the prey and predator behaviors of
the swarm-based algorithm and TwinSVM. Here, the metaheuristic algorithm is applied to
optimize TwinSVM parameters and feature selection. In other words, in this third approach,
we combine the previous two applications (spam reviews detection and sentiment analysis)
with the optimized version of SVM (TwinSVM) and feature selection technique to enhance
the spam detection performance.
Given that there are reviews in several languages, three different languages have been
taken into account when extracting reviews, namely English, Spanish, and Arabic. In
order to convert text into data that can be interpreted by machine learning algorithms, word
representation methods have been applied. Due to their ability to deal with multilingual
text, three pre-trained word embedding models were applied, namely, Bidirectional Encoder
Representations from Transformers (BERT), FastText, and Multilingual Unsupervised and
Supervised Embeddings (MUSE). Consequently, twenty-four datasets were generated, each
comprising three languages, and a multilingual dataset was created by combining these three
languages with two different dimensions (rows) (100 & 400).
The experimental results of all approaches demonstrate the superiority of the proposed
methods compared with other algorithms from the State of the Art. Specifically, the prey
and predator-based algorithm with WSVM obtained the best results (Accuracy) in the first
approach, with 88%, 71%, 89%, and 84%, for English, Spanish, Arabic, and Multilingual
datasets, respectively. As for the second approach, the special chain movement algorithm combined with TwinSVM outperformed the other algorithms on most datasets. More precisely,
it has the best results in three out of the main four datasets with 87%, 86%, and 87%
for Arabic, English, and Multilingual datasets, respectively. In the final approach, the prey
and predator algorithm with TwinSVM-Fs achieved the highest accuracy when compared
with other algorithms, with 92%, 89%, 80%, and 85% for Arabic, English, Spanish, and
Multilingual datasets, respectively. Further, with the help of sentiment analysis, the results
improved by 1.0994%, 2.6674%, 9.4430%, and 8.7448% for Arabic BERT-100, English
MUSE-400, Spanish Fast-400, and Multi MUSE-400, respectively. At the end of each
approach, a comprehensive analysis of the review’s style, context, and preferences was
discussed.
Overall, the thesis proposes three effective approaches to detect spam reviews, study
the sentiment of the reviews, and ultimately, improve the detection of spam reviews using
sentiment analysis, all within a multilingual environment. | es_ES |