On the Application of Conformers to Logical Access Voice Spoofing Attack Detection
Metadata
Show full item recordAuthor
Roselló Casado, Eros; Gómez Alanís, Alejandro; Chica Villar, Manuel; Gómez García, Ángel Manuel; González López, José Andrés; Peinado Herreros, Antonio MiguelEditorial
ISCA - Iberspeech 2022
Materia
Spoofing detection Deep learning Conformers
Date
2022-11Sponsorship
Project PID2019-104206GB-I00 funded by MCIN/AEI/10.13039/501100011033; FEDER/Junta de Andalucía-Consejería de Transformación Económica, Industria, Conocimiento y Universidades, Proyecto PY20_00902Abstract
Biometric systems are exposed to spoofing attacks which may compromise their security, and automatic speaker verification (ASV) is no exception. To increase the robustness against such attacks, anti-spoofing systems have been proposed for the de- tection of spoofed audio attacks. However, most of these sys- tems can not capture long-term feature dependencies and can only extract local features. While transformers are an excellent solution for the exploitation of these long-distance correlations, they may degrade local details. On the contrary, convolutional neural networks (CNNs) are a powerful tool for extracting lo- cal features but not so much for capturing global representa- tions. The conformer is a model that combines the best of both techniques, CNNs and transformers, to model both local and global dependencies and has been used for speech recogni- tion achieving state-of-the-art performance. While conformers have been mainly applied to sequence-to-sequence problems, in this work we make a preliminary study of their adaptation to a binary classification task such as anti-spoofing, with focus on synthesis and voice-conversion-based attacks. To evaluate our proposals, experiments were carried out on the ASVspoof 2019 logical access database. The experimental results show that the proposed system can obtain encouraging results, although more research will be required in order to outperform other state-of- the-art systems.