Speech emotion recognition via multiple fusion under spatial–temporal parallel network

Gan, Chenquan; García López, Salvador

doi:10.1016/j.neucom.2023.126623

1-s2.0-S0925231223007464-main.pdf (1.571Mb)

Identificadores

URI: https://hdl.handle.net/10481/85177

DOI: 10.1016/j.neucom.2023.126623

Exportar

Editorial

Elsevier

Materia

Speech emotion recognition

Speech spectrum

Spatial–temporal parallel network

Multiple fusion

Fecha

2023-10-28

Referencia bibliográfica

C. Gan et al. Speech emotion recognition via multiple fusion under spatial–temporal parallel network. Neurocomputing 555 (2023) 126623. [https://doi.org/10.1016/j.neucom.2023.126623]

Patrocinador

National Natural Science Foundation of China 61702066; Chongqing Research Program of Basic Research and Frontier Technology, China cstc2021jcyj-msxmX0761; MICINN/AEI/10.13039/501100011033: PID2020-119478GB-I00; FEDER/Junta de Andalucía A-TIC-434- UGR20

Resumen

Speech, as a necessary way to express emotions, plays a vital role in human communication. With the continuous deepening of research on emotion recognition in human-computer interaction, speech emotion recognition (SER) has become an essential task to improve the human-computer interaction experience. When performing emotion feature extraction of speech, the method of cutting the speech spectrum will destroy the continuity of speech. Besides, the method of using the cascaded structure without cutting the speech spectrum cannot simultaneously extract speech spectrum information from both temporal and spatial domains. To this end, we propose a spatial-temporal parallel network for speech emotion recognition without cutting the speech spectrum. To further mix the temporal and spatial features, we design a novel fusion method (called multiple fusion) that combines the concatenate fusion and ensemble strategy. Finally, the experimental results on five datasets demonstrate that the proposed method outperforms state-of-the-art methods.

Colecciones

DCCIA - Artículos

Excepto si se señala otra cosa, la licencia del ítem se describe como Atribución 4.0 Internacional