Direct Speech Synthesis from Non-audible Speech Biosignals: A Comparative Study

Lobato Martín, Javier; Pérez Córdoba, José Luis; González López, José Andrés

doi:10.21437/IberSPEECH.2024-18

dc.contributor.author	Lobato Martín, Javier
dc.contributor.author	Pérez Córdoba, José Luis
dc.contributor.author	González López, José Andrés
dc.date.accessioned	2024-11-11T09:17:36Z
dc.date.available	2024-11-11T09:17:36Z
dc.date.issued	2024-11-11
dc.identifier.citation	Lobato Martín, J., Pérez Córdoba, J.L., Gonzalez-Lopez, J.A. (2024) Direct Speech Synthesis from Non-audible Speech Biosignals: A Comparative Study. Proc. IberSPEECH 2024, 86-90, doi: 10.21437/IberSPEECH.2024-18	es_ES
dc.identifier.uri	https://hdl.handle.net/10481/96807
dc.description	This work was supported by grant PID2022-141378OBC22 funded by MICIU/AEI/10.13039/501100011033 and by ERDF/EU.	es_ES
dc.description.abstract	This paper presents a speech restoration system that generates audible speech from articulatory movement data captured using Permanent Magnet Articulography (PMA). Several algorithms were explored for speech synthesis, including classical unit-selection and deep neural network (DNN) methods. A database containing simultaneous PMA and speech recordings from healthy subjects was used for training and validation. The system generates either direct waveforms or acoustic parameters, which are converted to audio via a vocoder. Results show intelligible speech synthesis is feasible, with Mel-Cepstral Distortion (MCD) values between 9.41 and 12.4 dB, and Short- Time Objective Intelligibility (STOI) scores ranging from 0.32 to 0.606, with a maximum near 0.9. Unit selection and recurrent neural network (RNN) methods performed best. Informal listening tests further confirmed the effectiveness of these	es_ES
dc.description.sponsorship	MICIU/AEI/10.13039/501100011033 PID2022-141378OBC22	es_ES
dc.description.sponsorship	ERDF/EU	es_ES
dc.language.iso	eng	es_ES
dc.publisher	Internation Speech Communication Association (ISCA)	es_ES
dc.rights	Atribución-NoComercial 4.0 Internacional	*
dc.rights.uri	http://creativecommons.org/licenses/by-nc/4.0/	*
dc.title	Direct Speech Synthesis from Non-audible Speech Biosignals: A Comparative Study	es_ES
dc.type	conference output	es_ES
dc.rights.accessRights	open access	es_ES
dc.identifier.doi	10.21437/IberSPEECH.2024-18
dc.type.hasVersion	VoR	es_ES

Ficheros en el ítem

Nombre:: Iberspeech24_JaviLobato.pdf
Tamaño:: 1.743Mb
Formato:: PDF

Este ítem aparece en la(s) siguiente(s) colección(ones)

DTSTC - Comunicaciones congresos, conferencias, ...

Mostrar el registro sencillo del ítem

Excepto si se señala otra cosa, la licencia del ítem se describe como Atribución-NoComercial 4.0 Internacional