Mostrar el registro sencillo del ítem
The role of window length and shift in complex-domain DNN-based speech enhancement
dc.contributor.author | García Ruíz, Celia | |
dc.contributor.author | Martín Doñas, Juan M. | |
dc.contributor.author | Gómez García, Ángel Manuel | |
dc.date.accessioned | 2023-03-16T07:08:53Z | |
dc.date.available | 2023-03-16T07:08:53Z | |
dc.date.issued | 2022-11 | |
dc.identifier.uri | https://hdl.handle.net/10481/80607 | |
dc.description.abstract | Deep learning techniques have widely been applied to speech enhancement as they show outstanding modeling capa- bilities that are needed for proper speech-noise separation. In contrast to other end-to-end approaches, masking-based meth- ods consider speech spectra as input to the deep neural network, providing spectral masks for noise removal or attenuation. In these approaches, the Short-Time Fourier Transform (STFT) and, particularly, the parameters used for the analysis/synthesis window, plays an important role which is often neglected. In this paper, we analyze the effects of window length and shift on a complex-domain convolutional-recurrent neural network (DCCRN) which is able to provide, separately, magnitude and phase corrections. Different perceptual quality and intelligibil- ity objective metrics are used to assess its performance. As a re- sult, we have observed that phase corrections have an increased impact with shorter window sizes. Similarly, as window overlap increases, phase takes more relevance than magnitude spectrum in speech enhancement. | es_ES |
dc.description.sponsorship | Project PID2019-104206GB-I00 funded by MCIN/AEI/10.13039/501100011033. | es_ES |
dc.language.iso | eng | es_ES |
dc.publisher | ISCA - Iberspeech 2022 | es_ES |
dc.rights | Attribution-NonCommercial-NoDerivatives 4.0 Internacional | * |
dc.rights.uri | http://creativecommons.org/licenses/by-nc-nd/4.0/ | * |
dc.subject | Speech enhancement | es_ES |
dc.subject | Deep neural network | es_ES |
dc.subject | Short Time Fourier Transform | es_ES |
dc.subject | Complex spectral masking | es_ES |
dc.title | The role of window length and shift in complex-domain DNN-based speech enhancement | es_ES |
dc.type | info:eu-repo/semantics/conferenceObject | es_ES |
dc.rights.accessRights | info:eu-repo/semantics/openAccess | es_ES |
dc.identifier.doi | 10.21437/IberSPEECH.2022-30 | |
dc.type.hasVersion | info:eu-repo/semantics/submittedVersion | es_ES |