New applications of models based on imprecise probabilities within data mining
MetadataShow full item record
AuthorMoral García, Serafín
Universidad de Granada
DepartamentoUniversidad de Granada. Programa de Doctorado en Tecnologías de la Información y de la Comunicación
Moral García, Serafín. New applications of models based on imprecise probabilities within data mining. Granada: Universidad de Granada, 2023. [https://hdl.handle.net/10481/79149]
SponsorshipTesis Univ. Granada.
When we have information about a finite set of possible alternatives provided by an expert or dataset, a mathematical model is needed to represent such information. In some cases, a unique probability distribution is not appropriate for this purpose because the available information is not sufficient. For this reason, several mathematical theories and models based on imprecise probabilities have been developed in the literature. In this thesis work, we analyze the relations between some imprecise probability theories and study the properties of some models based on imprecise probabilities. When imprecise probability theories and models arise, tools for quantifying the uncertaintybased information in such theories and models, usually called uncertainty measures, are needed. In this thesis work, we analyze the properties of some existing uncertainty measures in theories based on imprecise probabilities and propose uncertainty measures in imprecise probability theories and models that present some advantages over the existing ones. Situations in which it is necessary to represent the information provided by a dataset about a finite set of possible alternatives arise in classification, an essential task within Data Mining. This well-known task consists of predicting, for a given instance described via a set of attributes, the value of a variable under study, known as the class variable. In classification, it is often needed to quantify the uncertainty-based information about the class variable. For this purpose, classical probability theory (PT) has been employed for many years. In the last years, classification algorithms that represent the information about the class variable via imprecise probability models have been developed. Via experimental studies, it has been shown that classification methods based on imprecise probabilities significantly outperform the ones that utilize PT when data contain errors. When classifying an instance, classifiers tend to predict a single value of the class variable. Nonetheless, in some cases, there is not enough information available for a classifier to point out a single class value. In these situations, it is more logical that classifiers predict a set of class values instead of a single value of the class variable. This is known as Imprecise Classification. Classification algorithms (including Imprecise Classification) often aim to minimize the number of instances erroneously classified. This would be optimal if all classification errors had the same importance. Nevertheless, in practical applications, different classification errors usually lead to different costs. For this reason, classifiers that take the misclassification costs into account, also known as cost-sensitive classifiers, have been developed in the literature. Traditional classification (including Imprecise Classification) assumes that each instance has a single value of a class variable. However, in some domains, this task does not fit well because an instance may belong to multiple labels simultaneously. In these domains, the Multi-Label Classification task (MLC) is more suitable than traditional classification. MLC aims to predict the set of labels associated with a given instance described via an attribute set. Most of the MLC methods proposed so far represent the information provided by an MLC dataset about the set of labels via classical PT. In this thesis work, we develop new classification algorithms based on imprecise probability models, including Imprecise Classification, cost-sensitive Imprecise Classification, and MLC, that present some advantages and obtain better experimental results than the ones of the state-of-the-art.En esta tesis seguimos la línea de investigación de teorías y modelos de probabilidades imprecisas y medidas de incertidumbre con probabilidades imprecisas. También proponemos nuevos métodos de clasificación basados en probabilidades imprecisas que obtienen mejor rendimiento que los del estado del arte.