Speech emotion recognition via multiple fusion under spatial–temporal parallel network

Gan, Chenquan; García López, Salvador

doi:10.1016/j.neucom.2023.126623

dc.contributor.author	Gan, Chenquan
dc.contributor.author	García López, Salvador
dc.date.accessioned	2023-10-23T10:15:04Z
dc.date.available	2023-10-23T10:15:04Z
dc.date.issued	2023-10-28
dc.identifier.citation	C. Gan et al. Speech emotion recognition via multiple fusion under spatial–temporal parallel network. Neurocomputing 555 (2023) 126623. [https://doi.org/10.1016/j.neucom.2023.126623]	es_ES
dc.identifier.uri	https://hdl.handle.net/10481/85177
dc.description	The authors are grateful to the anonymous reviewers and the editor for their valuable comments and suggestions. This work was supported by the National Natural Science Foundation of China (No. 61702066), the Chongqing Research Program of Basic Research and Frontier Technology, China (No. cstc2021jcyj-msxmX0761) and partially supported by Project PID2020-119478GB-I00 funded by MICINN/AEI/10.13039/501100011033 and by Project A-TIC-434- UGR20 funded by FEDER/Junta de Andalucía Consejería de Transformación Económica, Industria, Conocimiento Universidades.	es_ES
dc.description.abstract	Speech, as a necessary way to express emotions, plays a vital role in human communication. With the continuous deepening of research on emotion recognition in human-computer interaction, speech emotion recognition (SER) has become an essential task to improve the human-computer interaction experience. When performing emotion feature extraction of speech, the method of cutting the speech spectrum will destroy the continuity of speech. Besides, the method of using the cascaded structure without cutting the speech spectrum cannot simultaneously extract speech spectrum information from both temporal and spatial domains. To this end, we propose a spatial-temporal parallel network for speech emotion recognition without cutting the speech spectrum. To further mix the temporal and spatial features, we design a novel fusion method (called multiple fusion) that combines the concatenate fusion and ensemble strategy. Finally, the experimental results on five datasets demonstrate that the proposed method outperforms state-of-the-art methods.	es_ES
dc.description.sponsorship	National Natural Science Foundation of China 61702066	es_ES
dc.description.sponsorship	Chongqing Research Program of Basic Research and Frontier Technology, China cstc2021jcyj-msxmX0761	es_ES
dc.description.sponsorship	MICINN/AEI/10.13039/501100011033: PID2020-119478GB-I00	es_ES
dc.description.sponsorship	FEDER/Junta de Andalucía A-TIC-434- UGR20	es_ES
dc.language.iso	eng	es_ES
dc.publisher	Elsevier	es_ES
dc.rights	Atribución 4.0 Internacional	*
dc.rights.uri	http://creativecommons.org/licenses/by/4.0/	*
dc.subject	Speech emotion recognition	es_ES
dc.subject	Speech spectrum	es_ES
dc.subject	Spatial–temporal parallel network	es_ES
dc.subject	Multiple fusion	es_ES
dc.title	Speech emotion recognition via multiple fusion under spatial–temporal parallel network	es_ES
dc.type	journal article	es_ES
dc.rights.accessRights	open access	es_ES
dc.identifier.doi	10.1016/j.neucom.2023.126623
dc.type.hasVersion	VoR	es_ES

Ficheros en el ítem

Nombre:: 1-s2.0-S0925231223007464-main.pdf
Tamaño:: 1.571Mb
Formato:: PDF

Este ítem aparece en la(s) siguiente(s) colección(ones)

DCCIA - Artículos

Mostrar el registro sencillo del ítem

Excepto si se señala otra cosa, la licencia del ítem se describe como Atribución 4.0 Internacional