REPROT: Explaining the predictions of complex deep learning architectures for object detection through reducts of an image

Bello García, Marilyn; Mesejo Santiago, Pablo; Cordón García, Óscar

doi:10.1016/j.ins.2023.119851

1-s2.0-S0020025523014366-main.pdf (6.862Mb)

Identificadores

URI: https://hdl.handle.net/10481/85813

DOI: 10.1016/j.ins.2023.119851

Exportar

Editorial

Elsevier

Materia

Deep learning

Visual explanation

Rough set theory

Reduct

Prototype image

Fecha

2024-01

Referencia bibliográfica

Bello, G. Nápoles, L. Concepción et al. REPROT: Explaining the predictions of complex deep learning architectures for object detection through reducts of an image. Information Sciences 654 (2024) 119851. [https://doi.org/10.1016/j.ins.2023.119851]

Patrocinador

MCIN/AEI/10.13039/501100011033/; FEDER PID2021-122916NB-I00

Resumen

Although deep learning models can solve complex prediction problems, they have been criticized for being ‘black boxes’. This implies that their decisions are difficult, if not impossible, to explain by simply inspecting their internal knowledge structures. Explainable Artificial Intelligence has attempted to open the black-box through model-specific and agnostic post-hoc methods that generate visualizations or derive associations between the problem features and the model predictions. This paper proposes a new method, termed REPROT, that explains the decisions of complex deep learning architectures based on local reducts of an image. A ‘reduct’ is a set of sufficiently descriptive features that can fully characterize the acquired knowledge. The created reducts are used to build a ‘prototype image’ that visually explains the inference obtained by a black-box model for an image. We focus on deep learning architectures whose complexity and internal particularities demand adapting existing model-specific explanation methods, making the explanation process more difficult. Experimental results show that the black-box model can detect an object using the prototype image generated from the reduct. Hence, the explanations will be given by “the minimum set of features sufficient for the neural model to detect an object”. The confidence scores obtained by architectures such as Inception, Yolo, and Mask R-CNN are higher for prototype images built from the reduct than those built from the most important superpixels according to the LIME method. Moreover, the target object is not detected on several occasions through the LIME output, thus supporting the superiority of the proposed explanation method.

Colecciones

DCCIA - Artículos

Excepto si se señala otra cosa, la licencia del ítem se describe como Atribución 4.0 Internacional