Generalizing max pooling via (a, b)-grouping functions for Convolutional Neural Networks

Rodríguez Martínez, Iosu; Herrera Triguero, Francisco

doi:10.1016/j.inffus.2023.101893

1-s2.0-S1566253523002099-main.pdf (1.011Mb)

Identificadores

URI: https://hdl.handle.net/10481/83517

DOI: 10.1016/j.inffus.2023.101893

Exportar

Editorial

Elsevier

Materia

Convolutional neural networks

Grouping functions

Pooling functions

Image classification

Fecha

2023

Referencia bibliográfica

I. Rodriguez-Martinez et al. Generalizing max pooling via (a, b)-grouping functions for Convolutional Neural Networks. Information Fusion 99 (2023) 101893. [https://doi.org/10.1016/j.inffus.2023.101893]

Patrocinador

Andalusian Excellence P18-FR-4961; Departamento de Universidad, Innovación y Transformación Digital; Conselho Nacional de Desenvolvimento Científico e Tecnológico 301618/2019-4 CNPq; Fundação de Amparo à Pesquisa do Estado do Rio Grande do Sul 19/2551-0001279-9 FAPERGS; Ministerio de Ciencia e Innovación AEI/10.13039/501100011033, PC095-096 FUSIPROD, PID2019-108392GB-I00 MICINN; Vedecká Grantová Agentúra MŠVVaŠ SR a SAV 1/0267/21 VEGA; Universidad Pública de Navarra UPNA

Resumen

Due to their high adaptability to varied settings and effective optimization algorithm, Convolutional Neural Networks (CNNs) have set the state-of-the-art on image processing jobs for the previous decade. CNNs work in a sequential fashion, alternating between extracting significant features from an input image and aggregating these features locally through “pooling” functions, in order to produce a more compact representation. Functions like the arithmetic mean or, more typically, the maximum are commonly used to perform this downsampling operation. Despite the fact that many studies have been devoted to the development of alternative pooling algorithms, in practice, “max-pooling” still equals or exceeds most of these possibilities, and has become the standard for CNN construction. In this paper we focus on the properties that make the maximum such an efficient solution in the context of CNN feature downsampling and propose its replacement by grouping functions, a family of functions that share those desirable properties. In order to adapt these functions to the context of CNNs, we present (a,b)-grouping functions, an extension of grouping functions to work with real valued data. We present different construction methods for (a,b)-grouping functions, and demonstrate their empirical applicability for replacing max-pooling by using them to replace the pooling function of many well-known CNN architectures, finding promising results.

Colecciones

DCCIA - Artículos

Excepto si se señala otra cosa, la licencia del ítem se describe como Atribución 4.0 Internacional