• français 
    • español
    • English
    • français
  • FacebookPinterestTwitter
  • español
  • English
  • français
Voir le document 
  •   Accueil de DIGIBUG
  • 1.-Investigación
  • Departamentos, Grupos de Investigación e Institutos
  • Departamento de Ciencias de la Computación e Inteligencia Artificial
  • DCCIA - Artículos
  • Voir le document
  •   Accueil de DIGIBUG
  • 1.-Investigación
  • Departamentos, Grupos de Investigación e Institutos
  • Departamento de Ciencias de la Computación e Inteligencia Artificial
  • DCCIA - Artículos
  • Voir le document
JavaScript is disabled for your browser. Some features of this site may not work without it.

The impact of heterogeneous distance functions on missing data imputation and classification performance

[PDF] 1-s2.0-S0952197622000707-main.pdf (860.2Ko)
Identificadores
URI: https://hdl.handle.net/10481/100996
DOI: 10.1016/j.engappai.2022.104791
Exportar
RISRefworksMendeleyBibtex
Estadísticas
Statistiques d'usage de visualisation
Metadatos
Afficher la notice complète
Auteur
Seoane Santos, Miriam; Henriques Abreu, Pedro; Fernández Hilario, Alberto Luis; Luengo Martín, Julián
Editorial
Elsevier
Date
2022-03-24
Referencia bibliográfica
Engineering Applications of Artificial Intelligence Volume 111, 104791
Résumé
This work performs an in-depth study of the impact of distance functions on K-Nearest Neighbours imputation of heterogeneous datasets. Missing data is generated at several percentages, on a large benchmark of 150 datasets (50 continuous, 50 categorical and 50 heterogeneous datasets) and data imputation is performed using different distance functions (HEOM, HEOM-R, HVDM, HVDM-R, HVDM-S, MDE and SIMDIST) and k values (1, 3, 5 and 7). The impact of distance functions on kNN imputation is then evaluated in terms of classification performance, through the analysis of a classifier learned from the imputed data, and in terms of imputation quality, where the quality of the reconstruction of the original values is assessed. By analysing the properties of heterogeneous distance functions over continuous and categorical datasets individually, we then study their behaviour over heterogeneous data. We discuss whether datasets with different natures may benefit from different distance functions and to what extent the component of a distance function that deals with missing values influences such choice. Our experiments show that missing data has a significant impact on distance computation and the obtained results provide guidelines on how to choose appropriate distance functions depending on data characteristics (continuous, categorical or heterogeneous datasets) and the objective of the study (classification or imputation tasks).
Colecciones
  • DCCIA - Artículos

Mon compte

Ouvrir une sessionS'inscrire

Parcourir

Tout DIGIBUGCommunautés et CollectionsPar date de publicationAuteursTitresSujetsFinanciaciónPerfil de autor UGRCette collectionPar date de publicationAuteursTitresSujetsFinanciación

Statistiques

Statistiques d'usage de visualisation

Servicios

Pasos para autoarchivoAyudaLicencias Creative CommonsSHERPA/RoMEODulcinea Biblioteca UniversitariaNos puedes encontrar a través deCondiciones legales

Contactez-nous | Faire parvenir un commentaire