A Deep Learning Loss Function Based on the Perceptual Evaluation of the Speech Quality

Martín Doñas, Juan M.; Gómez García, Ángel Manuel; González López, José Andrés; Peinado Herreros, Antonio Miguel

doi:10.1109/LSP.2018.2871419

single.pdf (196.8Kb)

Identificadores

URI: http://hdl.handle.net/10481/71497

DOI: 10.1109/LSP.2018.2871419

Exportar

Editorial

IEEE

Materia

Deep learning (DL)

Speech enhancement

Fecha

2018-09-19

Referencia bibliográfica

Martín-Doñas, J. M., Gomez, A. M., Gonzalez, J. A., & Peinado, A. M. (2018). A deep learning loss function based on the perceptual evaluation of the speech quality. IEEE Signal processing letters, 25(11), 1680-1684.

Patrocinador

Spanish MINECO/FEDER (Grant Number: TEC2016-80141-P); Spanish Ministry of Education through the National Program FPU (Grant Number: FPU15/04161); NVIDIA Corporation with the donation of a Titan X GPU

Resumen

This letter proposes a perceptual metric for speech quality evaluation, which is suitable, as a loss function, for training deep learning methods. This metric, derived from the perceptual evaluation of the speech quality algorithm, is computed in a per-frame basis and from the power spectra of the reference and processed speech signal. Thus, two disturbance terms, which account for distortion once auditory masking and threshold effects are factored in, amend the mean square error (MSE) loss function by introducing perceptual criteria based on human psychoacoustics. The proposed loss function is evaluated for noisy speech enhancement with deep neural networks. Experimental results show that our metric achieves significant gains in speech quality (evaluated using an objective metric and a listening test) when compared to using MSE or other perceptual-based loss functions from the literature.

Colecciones

DTSTC - Artículos

Excepto si se señala otra cosa, la licencia del ítem se describe como Atribución-NoComercial-SinDerivadas 3.0 España