Direct Speech Synthesis from Non-audible Speech Biosignals: A Comparative Study

Lobato Martín, Javier; Pérez Córdoba, José Luis; González López, José Andrés

doi:10.21437/IberSPEECH.2024-18

Iberspeech24_JaviLobato.pdf (1.743Mb)

Identificadores

URI: https://hdl.handle.net/10481/96807

DOI: 10.21437/IberSPEECH.2024-18

Exportar

Editorial

Internation Speech Communication Association (ISCA)

Fecha

2024-11-11

Referencia bibliográfica

Lobato Martín, J., Pérez Córdoba, J.L., Gonzalez-Lopez, J.A. (2024) Direct Speech Synthesis from Non-audible Speech Biosignals: A Comparative Study. Proc. IberSPEECH 2024, 86-90, doi: 10.21437/IberSPEECH.2024-18

Patrocinador

MICIU/AEI/10.13039/501100011033 PID2022-141378OBC22; ERDF/EU

Resumen

This paper presents a speech restoration system that generates audible speech from articulatory movement data captured using Permanent Magnet Articulography (PMA). Several algorithms were explored for speech synthesis, including classical unit-selection and deep neural network (DNN) methods. A database containing simultaneous PMA and speech recordings from healthy subjects was used for training and validation. The system generates either direct waveforms or acoustic parameters, which are converted to audio via a vocoder. Results show intelligible speech synthesis is feasible, with Mel-Cepstral Distortion (MCD) values between 9.41 and 12.4 dB, and Short- Time Objective Intelligibility (STOI) scores ranging from 0.32 to 0.606, with a maximum near 0.9. Unit selection and recurrent neural network (RNN) methods performed best. Informal listening tests further confirmed the effectiveness of these

Colecciones

DTSTC - Comunicaciones congresos, conferencias, ...

Excepto si se señala otra cosa, la licencia del ítem se describe como Atribución-NoComercial 4.0 Internacional