New applications of models based on imprecise probabilities within data mining
Metadatos
Afficher la notice complèteAuteur
Moral García, SerafínEditorial
Universidad de Granada
Departamento
Universidad de Granada. Programa de Doctorado en Tecnologías de la Información y de la ComunicaciónDate
2023Fecha lectura
2022-12-21Referencia bibliográfica
Moral García, Serafín. New applications of models based on imprecise probabilities within data mining. Granada: Universidad de Granada, 2023. [https://hdl.handle.net/10481/79149]
Patrocinador
Tesis Univ. Granada.Résumé
When we have information about a finite set of possible alternatives provided
by an expert or dataset, a mathematical model is needed to represent
such information. In some cases, a unique probability distribution is not appropriate
for this purpose because the available information is not sufficient.
For this reason, several mathematical theories and models based on imprecise
probabilities have been developed in the literature. In this thesis work, we analyze
the relations between some imprecise probability theories and study the
properties of some models based on imprecise probabilities. When imprecise
probability theories and models arise, tools for quantifying the uncertaintybased
information in such theories and models, usually called uncertainty
measures, are needed. In this thesis work, we analyze the properties of some
existing uncertainty measures in theories based on imprecise probabilities and
propose uncertainty measures in imprecise probability theories and models
that present some advantages over the existing ones.
Situations in which it is necessary to represent the information provided
by a dataset about a finite set of possible alternatives arise in classification, an
essential task within Data Mining. This well-known task consists of predicting,
for a given instance described via a set of attributes, the value of a variable
under study, known as the class variable. In classification, it is often needed to
quantify the uncertainty-based information about the class variable. For this
purpose, classical probability theory (PT) has been employed for many years.
In the last years, classification algorithms that represent the information about
the class variable via imprecise probability models have been developed. Via
experimental studies, it has been shown that classification methods based on
imprecise probabilities significantly outperform the ones that utilize PT when
data contain errors.
When classifying an instance, classifiers tend to predict a single value of the
class variable. Nonetheless, in some cases, there is not enough information
available for a classifier to point out a single class value. In these situations, it
is more logical that classifiers predict a set of class values instead of a single
value of the class variable. This is known as Imprecise Classification.
Classification algorithms (including Imprecise Classification) often aim to
minimize the number of instances erroneously classified. This would be optimal
if all classification errors had the same importance. Nevertheless, in practical applications, different classification errors usually lead to different costs.
For this reason, classifiers that take the misclassification costs into account,
also known as cost-sensitive classifiers, have been developed in the literature.
Traditional classification (including Imprecise Classification) assumes that
each instance has a single value of a class variable. However, in some domains,
this task does not fit well because an instance may belong to multiple labels
simultaneously. In these domains, the Multi-Label Classification task (MLC)
is more suitable than traditional classification. MLC aims to predict the set of
labels associated with a given instance described via an attribute set. Most of
the MLC methods proposed so far represent the information provided by an
MLC dataset about the set of labels via classical PT.
In this thesis work, we develop new classification algorithms based on imprecise
probability models, including Imprecise Classification, cost-sensitive
Imprecise Classification, and MLC, that present some advantages and obtain
better experimental results than the ones of the state-of-the-art. En esta tesis seguimos la línea de investigación de teorías y modelos de
probabilidades imprecisas y medidas de incertidumbre con probabilidades
imprecisas. También proponemos nuevos métodos de clasificación basados en
probabilidades imprecisas que obtienen mejor rendimiento que los del estado
del arte.