New applications of models based on imprecise probabilities within data mining

Moral García, Serafín

87220(1).pdf (2.725Mb)

Identificadores

URI: https://hdl.handle.net/10481/79149

ISBN: 9788411176279

Exportar

Editorial

Universidad de Granada

Director

Abellán Mulero, Joaquín; Mantas Ruiz, Carlos Javier

Departamento

Universidad de Granada. Programa de Doctorado en Tecnologías de la Información y de la Comunicación

Fecha

2023

Fecha lectura

2022-12-21

Referencia bibliográfica

Moral García, Serafín. New applications of models based on imprecise probabilities within data mining. Granada: Universidad de Granada, 2023. [https://hdl.handle.net/10481/79149]

Patrocinador

Tesis Univ. Granada.

Resumen

When we have information about a finite set of possible alternatives provided by an expert or dataset, a mathematical model is needed to represent such information. In some cases, a unique probability distribution is not appropriate for this purpose because the available information is not sufficient. For this reason, several mathematical theories and models based on imprecise probabilities have been developed in the literature. In this thesis work, we analyze the relations between some imprecise probability theories and study the properties of some models based on imprecise probabilities. When imprecise probability theories and models arise, tools for quantifying the uncertaintybased information in such theories and models, usually called uncertainty measures, are needed. In this thesis work, we analyze the properties of some existing uncertainty measures in theories based on imprecise probabilities and propose uncertainty measures in imprecise probability theories and models that present some advantages over the existing ones. Situations in which it is necessary to represent the information provided by a dataset about a finite set of possible alternatives arise in classification, an essential task within Data Mining. This well-known task consists of predicting, for a given instance described via a set of attributes, the value of a variable under study, known as the class variable. In classification, it is often needed to quantify the uncertainty-based information about the class variable. For this purpose, classical probability theory (PT) has been employed for many years. In the last years, classification algorithms that represent the information about the class variable via imprecise probability models have been developed. Via experimental studies, it has been shown that classification methods based on imprecise probabilities significantly outperform the ones that utilize PT when data contain errors. When classifying an instance, classifiers tend to predict a single value of the class variable. Nonetheless, in some cases, there is not enough information available for a classifier to point out a single class value. In these situations, it is more logical that classifiers predict a set of class values instead of a single value of the class variable. This is known as Imprecise Classification. Classification algorithms (including Imprecise Classification) often aim to minimize the number of instances erroneously classified. This would be optimal if all classification errors had the same importance. Nevertheless, in practical applications, different classification errors usually lead to different costs. For this reason, classifiers that take the misclassification costs into account, also known as cost-sensitive classifiers, have been developed in the literature. Traditional classification (including Imprecise Classification) assumes that each instance has a single value of a class variable. However, in some domains, this task does not fit well because an instance may belong to multiple labels simultaneously. In these domains, the Multi-Label Classification task (MLC) is more suitable than traditional classification. MLC aims to predict the set of labels associated with a given instance described via an attribute set. Most of the MLC methods proposed so far represent the information provided by an MLC dataset about the set of labels via classical PT. In this thesis work, we develop new classification algorithms based on imprecise probability models, including Imprecise Classification, cost-sensitive Imprecise Classification, and MLC, that present some advantages and obtain better experimental results than the ones of the state-of-the-art.

En esta tesis seguimos la línea de investigación de teorías y modelos de probabilidades imprecisas y medidas de incertidumbre con probabilidades imprecisas. También proponemos nuevos métodos de clasificación basados en probabilidades imprecisas que obtienen mejor rendimiento que los del estado del arte.

Colecciones

Tesis

Excepto si se señala otra cosa, la licencia del ítem se describe como Attribution-NonCommercial-NoDerivatives 4.0 Internacional