Signal and Neural Processing against Spoofing Attacks and Deepfakes for Secure Voice Interaction (ASASVI)

Gómez García, Ángel Manuel; Peinado Herreros, Antonio Miguel; Sánchez Calle, Victoria Eugenia; López Espejo, Iván; Roselló Casado, Eros; Sanchez Valera, Jose Carlos; Martín Doñas, Juan Manuel

doi:10.21437/IberSPEECH.2024-49

gomez24b_iberspeech.pdf (207.5Kb)

Identificadores

URI: https://hdl.handle.net/10481/97005

DOI: 10.21437/IberSPEECH.2024-49

Exportar

Materia

Speech-based interaction

Security

Voice biometrics systems

Voice impersonation

Anti-spoofing

Deepfakes

Artificial intelligence

Fecha

2024-11

Patrocinador

Project PID2022-138711OB- I00 funded by MICIU/AEI/10.13039/501100011033 and by ERDF/EU.

Resumen

The increasing sophistication of multimodal interaction sys- tems, which enable human-like communication, raises concerns regarding the authenticity of exchanged speech data. Our research addresses the challenges posed by the malicious misuse of speech technologies, as for example voice conversion (VC) and text-to-speech (TTS), which can be exploited to impersonate speakers, manipulate public opinion, or compromise voice biometric systems. Existing countermeasures, known as anti-spoofing techniques, face significant limitations in effectively combating these threats. To tackle this, our project proposes three research directions: (1) improving deep neural network (DNN)-based anti-spoofing techniques through robust feature extractors, novel architectures, and enhanced training method- ologies to bridge the gap between laboratory performance and real-world application, (2) generating more realistic and diverse training data to better reflect real-world conditions and attacks, and (3) developing advanced, imperceptible watermarking techniques for synthesized speech to prevent misuse, even in the presence of deep learning-based removal attempts. This research aims to significantly enhance the security and reliability of computer-mediated speech interactions.

Colecciones

DTSTC - Comunicaciones congresos, conferencias, ...

Excepto si se señala otra cosa, la licencia del ítem se describe como Creative Commons Attribution-NonCommercial-NoDerivs 3.0 License