Direct Speech Synthesis from Non-audible Speech Biosignals: A Comparative Study
Metadata
Show full item recordEditorial
Internation Speech Communication Association (ISCA)
Date
2024-11-11Referencia bibliográfica
Lobato Martín, J., Pérez Córdoba, J.L., Gonzalez-Lopez, J.A. (2024) Direct Speech Synthesis from Non-audible Speech Biosignals: A Comparative Study. Proc. IberSPEECH 2024, 86-90, doi: 10.21437/IberSPEECH.2024-18
Sponsorship
MICIU/AEI/10.13039/501100011033 PID2022-141378OBC22; ERDF/EUAbstract
This paper presents a speech restoration system that generates audible speech from articulatory movement data captured using Permanent Magnet Articulography (PMA). Several algorithms were explored for speech synthesis, including classical unit-selection and deep neural network (DNN) methods. A database containing simultaneous PMA and speech recordings from healthy subjects was used for training and validation. The system generates either direct waveforms or acoustic parameters, which are converted to audio via a vocoder. Results show intelligible speech synthesis is feasible, with Mel-Cepstral Distortion (MCD) values between 9.41 and 12.4 dB, and Short- Time Objective Intelligibility (STOI) scores ranging from 0.32 to 0.606, with a maximum near 0.9. Unit selection and recurrent neural network (RNN) methods performed best. Informal listening tests further confirmed the effectiveness of these