Exploring Self-supervised Embeddings and Synthetic Data Augmentation for Robust Audio Deepfake Detection

Martín Doñas, Juan Manuel; Álvarez, Aitor; Roselló Casado, Eros; Gómez García, Ángel Manuel; Peinado Herreros, Antonio Miguel

doi:10.21437/Interspeech.2024-942

dc.contributor.author	Martín Doñas, Juan Manuel
dc.contributor.author	Álvarez, Aitor
dc.contributor.author	Roselló Casado, Eros
dc.contributor.author	Gómez García, Ángel Manuel
dc.contributor.author	Peinado Herreros, Antonio Miguel
dc.date.accessioned	2024-11-18T12:36:13Z
dc.date.available	2024-11-18T12:36:13Z
dc.date.issued	2024-08
dc.identifier.uri	https://hdl.handle.net/10481/97026
dc.description.abstract	This work explores the performance of large speech self- supervised models as robust audio deepfake detectors. Despite the current trend of fine-tuning the upstream network, in this paper, we revisit the use of pre-trained models as feature extractors to adapt specialized downstream audio deepfake classifiers. The goal is to keep the general knowledge of the audio foundation model to extract discriminative features to feed up a simplified deepfake classifier. In addition, the generalization capabilities of the system are improved by augmenting the training corpora using additional synthetic data from different vocoder algorithms. This strategy is also complemented by various data augmentations covering challenging acoustic conditions. Our proposal is evaluated under different benchmark datasets for audio deepfake and anti-spoofing tasks, showing state-of-the-art performance. Furthermore, we analyze the relevant parts of the downstream classifier to achieve a robust system.	es_ES
dc.description.sponsorship	Project EITHOS under Grant Agreement No. 101073928.	es_ES
dc.description.sponsorship	Project PID2022-138711OB-I00 funded by the MICIU/AEI/10.13039/501100011033 and by ERDF/EU	es_ES
dc.description.sponsorship	FPI grant PRE2022-000363	es_ES
dc.language.iso	eng	es_ES
dc.rights	Creative Commons Attribution-NonCommercial-NoDerivs 3.0 License	en_EN
dc.rights	Attribution-NonCommercial-NoDerivatives 4.0 Internacional	*
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/4.0/	*
dc.subject	audio deepfake detection	es_ES
dc.subject	anti-spoofing	es_ES
dc.subject	self-supervised models	es_ES
dc.subject	data augmentation	es_ES
dc.subject	vocoders	es_ES
dc.title	Exploring Self-supervised Embeddings and Synthetic Data Augmentation for Robust Audio Deepfake Detection	es_ES
dc.type	conference output	es_ES
dc.rights.accessRights	open access	es_ES
dc.identifier.doi	10.21437/Interspeech.2024-942
dc.type.hasVersion	VoR	es_ES

Ficheros en el ítem

Nombre:: martindonas24_interspeech.pdf
Tamaño:: 290.0Kb
Formato:: PDF

Este ítem aparece en la(s) siguiente(s) colección(ones)

DTSTC - Comunicaciones congresos, conferencias, ...

Mostrar el registro sencillo del ítem

Excepto si se señala otra cosa, la licencia del ítem se describe como Creative Commons Attribution-NonCommercial-NoDerivs 3.0 License