A Deep Learning Loss Function Based on the Perceptual Evaluation of the Speech Quality

Martín Doñas, Juan M.; Gómez García, Ángel Manuel; González López, José Andrés; Peinado Herreros, Antonio Miguel

doi:10.1109/LSP.2018.2871419

dc.contributor.author	Martín Doñas, Juan M.
dc.contributor.author	Gómez García, Ángel Manuel
dc.contributor.author	González López, José Andrés
dc.contributor.author	Peinado Herreros, Antonio Miguel
dc.date.accessioned	2021-11-15T07:28:13Z
dc.date.available	2021-11-15T07:28:13Z
dc.date.issued	2018-09-19
dc.identifier.citation	Martín-Doñas, J. M., Gomez, A. M., Gonzalez, J. A., & Peinado, A. M. (2018). A deep learning loss function based on the perceptual evaluation of the speech quality. IEEE Signal processing letters, 25(11), 1680-1684.	es_ES
dc.identifier.uri	http://hdl.handle.net/10481/71497
dc.description.abstract	This letter proposes a perceptual metric for speech quality evaluation, which is suitable, as a loss function, for training deep learning methods. This metric, derived from the perceptual evaluation of the speech quality algorithm, is computed in a per-frame basis and from the power spectra of the reference and processed speech signal. Thus, two disturbance terms, which account for distortion once auditory masking and threshold effects are factored in, amend the mean square error (MSE) loss function by introducing perceptual criteria based on human psychoacoustics. The proposed loss function is evaluated for noisy speech enhancement with deep neural networks. Experimental results show that our metric achieves significant gains in speech quality (evaluated using an objective metric and a listening test) when compared to using MSE or other perceptual-based loss functions from the literature.	es_ES
dc.description.sponsorship	Spanish MINECO/FEDER (Grant Number: TEC2016-80141-P)	es_ES
dc.description.sponsorship	Spanish Ministry of Education through the National Program FPU (Grant Number: FPU15/04161)	es_ES
dc.description.sponsorship	NVIDIA Corporation with the donation of a Titan X GPU	es_ES
dc.language.iso	eng	es_ES
dc.publisher	IEEE	es_ES
dc.rights	Atribución-NoComercial-SinDerivadas 3.0 España	*
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/3.0/es/	*
dc.subject	Deep learning (DL)	es_ES
dc.subject	Speech enhancement	es_ES
dc.title	A Deep Learning Loss Function Based on the Perceptual Evaluation of the Speech Quality	es_ES
dc.type	journal article	es_ES
dc.rights.accessRights	open access	es_ES
dc.identifier.doi	10.1109/LSP.2018.2871419

Fichier(s) constituant ce document

Nom:: single.pdf
Taille:: 196.8Ko
Format:: PDF

Ce document figure dans la(les) collection(s) suivante(s)

DTSTC - Artículos

Afficher la notice abrégée

Excepté là où spécifié autrement, la license de ce document est décrite en tant que Atribución-NoComercial-SinDerivadas 3.0 España