Grupo: Signal Processing, Multimedia Transmission and Speech/Audio Technologies (TIC234)

Grupo: Signal Processing, Multimedia Transmission and Speech/Audio Technologies (TIC234) https://hdl.handle.net/10481/64350 2026-04-12T05:47:10Z Dual-Channel Spectral Weighting for Robust Speech Recognition in Mobile Devices https://hdl.handle.net/10481/110976 Dual-Channel Spectral Weighting for Robust Speech Recognition in Mobile Devices López-Espejo, Iván; Peinado, Antonio M.; Gomez, Angel M.; Gonzalez, Jose A. Recreating Neural Activity During Speech Production with Language and Speech Model Embeddings https://hdl.handle.net/10481/105970 Recreating Neural Activity During Speech Production with Language and Speech Model Embeddings Khanday, Owais Mujtaba; Rodríguez San Esteban, Pablo; Ahmad, Zubair; Ouellet, Marc; González López, José Andrés Understanding how neural activity encodes speech and language production is a fundamental challenge in neuroscience and artificial intelligence. This study investigates whether embeddings from large-scale, self-supervised language and speech models can effectively reconstruct high-gamma neural activity characteristics, key indicators of cortical processing, recorded during speech production. We use pre-trained embeddings from deep learning models on linguistic and acoustic data to map high-level speech features onto high-gamma signals. We analyze the extent to which these embeddings preserve the spatio-temporal dynamics of brain activity. Reconstructed neural signals are evaluated against high-gamma ground-truth activity using correlation metrics and signal reconstruction quality assessments. The results indicate High-gamma activity was effectively reconstructed using language and speech model embeddings, yielding Pearson correlation coefficients of 0.79–0.99 across all participants. NeuroIncept Decoder for High-Fidelity Speech Reconstruction from Neural Activity https://hdl.handle.net/10481/105969 NeuroIncept Decoder for High-Fidelity Speech Reconstruction from Neural Activity Khanday, Owais Mujtaba; Pérez Córdoba, José Luis; Mir, Mohd Yaqub; Najar, Ashfaq Ahmad; González López, José Andrés This paper introduces a novel algorithm designed for speech synthesis from neural activity recordings obtained using invasive electroencephalography (EEG) techniques. The proposed system offers a promising communication solution for individuals with severe speech impairments. Central to our approach is the integration of time-frequency features in the high-gamma band computed from EEG recordings with an advanced NeuroIncept Decoder architecture. This neural network architecture combines Convolutional Neural Networks (CNNs) and Gated Recurrent Units (GRUs) to reconstruct audio spectrograms from neural patterns. Our model demonstrates robust mean correlation coefficients between predicted and actual spectrograms, though inter-subject variability indicates distinct neural processing mechanisms among participants. Overall, our study highlights the potential of neural decoding techniques to restore communicative abilities in individuals with speech disorders and paves the way for future advancements in brain-computer interface technologies. Integrating the Perceptual PMSQE Loss into DNN-based Speech Watermarking https://hdl.handle.net/10481/98117 Integrating the Perceptual PMSQE Loss into DNN-based Speech Watermarking Hernández-Manrique, Pablo; Peinado Herreros, Antonio Miguel; Gómez García, Ángel Manuel Speech and audio watermarking has been an active research topic during the last thirty years. However, unlike other signal processing techniques, implementations based on deep neural networks (DNN) are relatively recent and many issues remain unexplored. In this paper, we focus on speech watermarking and a key requirement such as the imperceptibility of the watermark. In particular, we explore the application the Perceptual Metric for Speech Quality Evaluation (PMSQE) loss function, originally proposed in the context of speech enhancement, for achieving this goal. In particular, we examine the training trade-offs associated to the watermarking system training procedure and look for a suitable way of incorporating the PMSQE loss. Our experimental results show that the PMSQE loss can, not only meaningfully improve the perceptual quality of the watermarked speech, but also keep, or even improve, other audio quality measures and the bit error rates yielded by attacked signals. Noise-Robust Hearing Aid Voice Control https://hdl.handle.net/10481/97890 Noise-Robust Hearing Aid Voice Control López Espejo, Iván; Roselló, Eros; Edraki, Amin; Harte, Naomi; Jensen, Jesper Advancing the design of robust hearing aid (HA) voice control is crucial to increase the HA use rate among hard of hearing people as well as to improve HA users’ experience. In this work, we contribute towards this goal by, first, presenting a novel HA speech dataset consisting of noisy own voice captured by 2 behind-the-ear (BTE) and 1 in-ear-canal (IEC) microphones. Second, we provide baseline HA voice control results from the evaluation of light, state-of-the-art keyword spotting mod- els utilizing different combinations of HA microphone signals. Experimental results show the benefits of exploiting bandwidth- limited bone-conducted speech (BCS) from the IEC microphone to achieve noise-robust HA voice control. Furthermore, results also demonstrate that voice control performance can be boosted by assisting BCS by the broader-bandwidth BTE microphone signals. Aiming at setting a baseline upon which the scientific community can continue to progress, the HA noisy speech dataset has been made publicly available.