A conformer-based classifier for variable-length utterance processing in anti-spoofing

Roselló Casado, Eros; Gómez Alanís, Alejandro; Gómez García, Ángel Manuel; Peinado Herreros, Antonio Miguel

doi:10.21437/Interspeech.2023-1820

rosello23_interspeech.pdf (312.0Kb)

Identificadores

URI: https://hdl.handle.net/10481/88807

DOI: 10.21437/Interspeech.2023-1820

Exportar

Editorial

ISCA - Interspeech 2023

Fecha

2023

Patrocinador

FEDER/Junta de Andalucía-Consejería de Transformación Económica, Industria, Conocimiento y Universidades. Proyecto PY20_00902; Project PID2019-104206GB-I00 funded by MCIN/AEI/10.13039/501100011033

Resumen

The success achieved by conformers in Automatic Speech Recognition (ASR) leads us to their application in other domains, such as spoofing detection for automatic speaker verification (ASV), where the conformer self-attention mechanism might effectively model and detect the artifacts introduced in spoofed speech signals. Also, conformers can naturally handle the variable duration of speech utterances. However, as with transformers, the conformer performance may degrade when trained with limited data. To address this issue, we propose utilizing conformers in conjunction with self-supervised learning, specifically leveraging a pre-trained model called wav2vec 2.0, which is pre-trained using a substantial amount of bonafide data. Our experimental results demonstrate that our proposed method achieves one of the best results in the recent ASVspoof 2021 logical access (LA) and deep fake (DF) databases.

Colecciones

DTSTC - Comunicaciones congresos, conferencias, ...