TIC234 - Comunicación Congresos, Conferencias...

TIC234 - Comunicación Congresos, Conferencias... https://hdl.handle.net/10481/64355 Sun, 05 Apr 2026 20:37:33 GMT 2026-04-05T20:37:33Z Recreating Neural Activity During Speech Production with Language and Speech Model Embeddings https://hdl.handle.net/10481/105970 Recreating Neural Activity During Speech Production with Language and Speech Model Embeddings Khanday, Owais Mujtaba; Rodríguez San Esteban, Pablo; Ahmad, Zubair; Ouellet, Marc; González López, José Andrés Understanding how neural activity encodes speech and language production is a fundamental challenge in neuroscience and artificial intelligence. This study investigates whether embeddings from large-scale, self-supervised language and speech models can effectively reconstruct high-gamma neural activity characteristics, key indicators of cortical processing, recorded during speech production. We use pre-trained embeddings from deep learning models on linguistic and acoustic data to map high-level speech features onto high-gamma signals. We analyze the extent to which these embeddings preserve the spatio-temporal dynamics of brain activity. Reconstructed neural signals are evaluated against high-gamma ground-truth activity using correlation metrics and signal reconstruction quality assessments. The results indicate High-gamma activity was effectively reconstructed using language and speech model embeddings, yielding Pearson correlation coefficients of 0.79–0.99 across all participants. https://hdl.handle.net/10481/105970 NeuroIncept Decoder for High-Fidelity Speech Reconstruction from Neural Activity https://hdl.handle.net/10481/105969 NeuroIncept Decoder for High-Fidelity Speech Reconstruction from Neural Activity Khanday, Owais Mujtaba; Pérez Córdoba, José Luis; Mir, Mohd Yaqub; Najar, Ashfaq Ahmad; González López, José Andrés This paper introduces a novel algorithm designed for speech synthesis from neural activity recordings obtained using invasive electroencephalography (EEG) techniques. The proposed system offers a promising communication solution for individuals with severe speech impairments. Central to our approach is the integration of time-frequency features in the high-gamma band computed from EEG recordings with an advanced NeuroIncept Decoder architecture. This neural network architecture combines Convolutional Neural Networks (CNNs) and Gated Recurrent Units (GRUs) to reconstruct audio spectrograms from neural patterns. Our model demonstrates robust mean correlation coefficients between predicted and actual spectrograms, though inter-subject variability indicates distinct neural processing mechanisms among participants. Overall, our study highlights the potential of neural decoding techniques to restore communicative abilities in individuals with speech disorders and paves the way for future advancements in brain-computer interface technologies. https://hdl.handle.net/10481/105969 Integrating the Perceptual PMSQE Loss into DNN-based Speech Watermarking https://hdl.handle.net/10481/98117 Integrating the Perceptual PMSQE Loss into DNN-based Speech Watermarking Hernández-Manrique, Pablo; Peinado Herreros, Antonio Miguel; Gómez García, Ángel Manuel Speech and audio watermarking has been an active research topic during the last thirty years. However, unlike other signal processing techniques, implementations based on deep neural networks (DNN) are relatively recent and many issues remain unexplored. In this paper, we focus on speech watermarking and a key requirement such as the imperceptibility of the watermark. In particular, we explore the application the Perceptual Metric for Speech Quality Evaluation (PMSQE) loss function, originally proposed in the context of speech enhancement, for achieving this goal. In particular, we examine the training trade-offs associated to the watermarking system training procedure and look for a suitable way of incorporating the PMSQE loss. Our experimental results show that the PMSQE loss can, not only meaningfully improve the perceptual quality of the watermarked speech, but also keep, or even improve, other audio quality measures and the bit error rates yielded by attacked signals. https://hdl.handle.net/10481/98117