On the Application of Conformers to Logical Access Voice Spoofing Attack Detection Roselló Casado, Eros Gómez Alanís, Alejandro Chica Villar, Manuel Gómez García, Ángel Manuel González López, José Andrés Peinado Herreros, Antonio Miguel Spoofing detection Deep learning Conformers Biometric systems are exposed to spoofing attacks which may compromise their security, and automatic speaker verification (ASV) is no exception. To increase the robustness against such attacks, anti-spoofing systems have been proposed for the de- tection of spoofed audio attacks. However, most of these sys- tems can not capture long-term feature dependencies and can only extract local features. While transformers are an excellent solution for the exploitation of these long-distance correlations, they may degrade local details. On the contrary, convolutional neural networks (CNNs) are a powerful tool for extracting lo- cal features but not so much for capturing global representa- tions. The conformer is a model that combines the best of both techniques, CNNs and transformers, to model both local and global dependencies and has been used for speech recogni- tion achieving state-of-the-art performance. While conformers have been mainly applied to sequence-to-sequence problems, in this work we make a preliminary study of their adaptation to a binary classification task such as anti-spoofing, with focus on synthesis and voice-conversion-based attacks. To evaluate our proposals, experiments were carried out on the ASVspoof 2019 logical access database. The experimental results show that the proposed system can obtain encouraging results, although more research will be required in order to outperform other state-of- the-art systems. 2023-03-16T07:21:15Z 2023-03-16T07:21:15Z 2022-11 conference output https://hdl.handle.net/10481/80609 10.21437/IberSPEECH.2022-37 eng http://creativecommons.org/licenses/by-nc-nd/4.0/ open access Attribution-NonCommercial-NoDerivatives 4.0 Internacional ISCA - Iberspeech 2022