<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF xmlns="http://purl.org/rss/1.0/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/">
<channel rdf:about="https://hdl.handle.net/10481/64350">
<title>Grupo: Signal Processing, Multimedia Transmission and Speech/Audio Technologies (TIC234)</title>
<link>https://hdl.handle.net/10481/64350</link>
<description/>
<items>
<rdf:Seq>
<rdf:li rdf:resource="https://hdl.handle.net/10481/110976"/>
<rdf:li rdf:resource="https://hdl.handle.net/10481/105970"/>
<rdf:li rdf:resource="https://hdl.handle.net/10481/105969"/>
<rdf:li rdf:resource="https://hdl.handle.net/10481/98117"/>
<rdf:li rdf:resource="https://hdl.handle.net/10481/97890"/>
</rdf:Seq>
</items>
<dc:date>2026-04-12T05:47:10Z</dc:date>
</channel>
<item rdf:about="https://hdl.handle.net/10481/110976">
<title>Dual-Channel Spectral Weighting for Robust Speech Recognition in Mobile Devices</title>
<link>https://hdl.handle.net/10481/110976</link>
<description>Dual-Channel Spectral Weighting for Robust Speech Recognition in Mobile Devices
López-Espejo, Iván; Peinado, Antonio M.; Gomez, Angel M.; Gonzalez, Jose A.
</description>
</item>
<item rdf:about="https://hdl.handle.net/10481/105970">
<title>Recreating Neural Activity During Speech Production with Language and Speech Model Embeddings</title>
<link>https://hdl.handle.net/10481/105970</link>
<description>Recreating Neural Activity During Speech Production with Language and Speech Model Embeddings
Khanday, Owais Mujtaba; Rodríguez San Esteban, Pablo; Ahmad, Zubair; Ouellet, Marc; González López, José Andrés
Understanding how neural activity encodes speech and language production is a fundamental challenge in neuroscience and artificial intelligence. This study investigates whether embeddings from large-scale, self-supervised language and speech models can effectively reconstruct high-gamma neural activity characteristics, key indicators of cortical processing, recorded during speech production. We use pre-trained embeddings from deep learning models on linguistic and acoustic data to map high-level speech features onto high-gamma signals. We analyze the extent to which these embeddings preserve the spatio-temporal dynamics of brain activity. Reconstructed neural signals are evaluated against high-gamma ground-truth activity using correlation metrics and signal reconstruction quality assessments. The results indicate High-gamma activity was effectively reconstructed using language and speech model embeddings, yielding Pearson correlation coefficients of 0.79–0.99 across all participants.
</description>
</item>
<item rdf:about="https://hdl.handle.net/10481/105969">
<title>NeuroIncept Decoder for High-Fidelity Speech Reconstruction from Neural Activity</title>
<link>https://hdl.handle.net/10481/105969</link>
<description>NeuroIncept Decoder for High-Fidelity Speech Reconstruction from Neural Activity
Khanday, Owais Mujtaba; Pérez Córdoba, José Luis; Mir, Mohd Yaqub; Najar, Ashfaq Ahmad; González López, José Andrés
This paper introduces a novel algorithm designed for speech synthesis from neural activity recordings obtained using invasive electroencephalography (EEG) techniques. The proposed system offers a promising communication solution for individuals with severe speech impairments. Central to our approach is the integration of time-frequency features in the high-gamma band computed from EEG recordings with an advanced NeuroIncept Decoder architecture. This neural network architecture combines Convolutional Neural Networks (CNNs) and Gated Recurrent Units (GRUs) to reconstruct audio spectrograms from neural patterns. Our model demonstrates robust mean correlation coefficients between predicted and actual spectrograms, though inter-subject variability indicates distinct neural processing mechanisms among participants. Overall, our study highlights the potential of neural decoding techniques to restore communicative abilities in individuals with speech disorders and paves the way for future advancements in brain-computer interface technologies.
</description>
</item>
<item rdf:about="https://hdl.handle.net/10481/98117">
<title>Integrating the Perceptual PMSQE Loss into DNN-based Speech Watermarking</title>
<link>https://hdl.handle.net/10481/98117</link>
<description>Integrating the Perceptual PMSQE Loss into DNN-based Speech Watermarking
Hernández-Manrique, Pablo; Peinado Herreros, Antonio Miguel; Gómez García, Ángel Manuel
Speech and audio watermarking has been an active research topic during the last thirty years. However, unlike other signal processing techniques, implementations based on deep neural networks (DNN) are relatively recent and many issues remain unexplored. In this paper, we focus on speech watermarking and a key requirement such as the imperceptibility of the watermark. In particular, we explore the application the Perceptual Metric for Speech Quality Evaluation (PMSQE) loss function, originally proposed in the context of speech enhancement, for achieving this goal. In particular, we examine the training trade-offs associated to the watermarking system training procedure and look for a suitable way of incorporating the PMSQE loss. Our experimental results show that the PMSQE loss can, not only meaningfully improve the perceptual quality of the watermarked speech, but also keep, or even improve, other audio quality measures and the bit error rates yielded by attacked signals.
</description>
</item>
<item rdf:about="https://hdl.handle.net/10481/97890">
<title>Noise-Robust Hearing Aid Voice Control</title>
<link>https://hdl.handle.net/10481/97890</link>
<description>Noise-Robust Hearing Aid Voice Control
López Espejo, Iván; Roselló, Eros; Edraki, Amin; Harte, Naomi; Jensen, Jesper
Advancing the design of robust hearing aid (HA)&#13;
voice control is crucial to increase the HA use rate among hard&#13;
of hearing people as well as to improve HA users’ experience. In&#13;
this work, we contribute towards this goal by, first, presenting a&#13;
novel HA speech dataset consisting of noisy own voice captured&#13;
by 2 behind-the-ear (BTE) and 1 in-ear-canal (IEC) microphones.&#13;
Second, we provide baseline HA voice control results from&#13;
the evaluation of light, state-of-the-art keyword spotting mod-&#13;
els utilizing different combinations of HA microphone signals.&#13;
Experimental results show the benefits of exploiting bandwidth-&#13;
limited bone-conducted speech (BCS) from the IEC microphone&#13;
to achieve noise-robust HA voice control. Furthermore, results&#13;
also demonstrate that voice control performance can be boosted&#13;
by assisting BCS by the broader-bandwidth BTE microphone&#13;
signals. Aiming at setting a baseline upon which the scientific&#13;
community can continue to progress, the HA noisy speech dataset&#13;
has been made publicly available.
</description>
</item>
</rdf:RDF>
