Learning Visual Voice Activity Detection with an Automatically Annotated Dataset

Guy, Sylvain; Lathuilière, Stéphane; Mesejo Santiago, Pablo; Horaud, Radu

dc.contributor.author	Guy, Sylvain
dc.contributor.author	Lathuilière, Stéphane
dc.contributor.author	Mesejo Santiago, Pablo
dc.contributor.author	Horaud, Radu
dc.date.accessioned	2021-10-04T07:11:17Z
dc.date.available	2021-10-04T07:11:17Z
dc.date.issued	2020-10-16
dc.identifier.citation	Published version: S. Guy... [et al.]. "Learning Visual Voice Activity Detection with an Automatically Annotated Dataset," 2020 25th International Conference on Pattern Recognition (ICPR), 2021, pp. 4851-4856, doi: [10.1109/ICPR48806.2021.9412884]	es_ES
dc.identifier.uri	http://hdl.handle.net/10481/70588
dc.description	This work has been funded by the EU H2020 project #871245 SPRING and by the Multidisciplinary Institute in Artificial Intelligence (MIAI) #ANR-19-P3IA-0003.	es_ES
dc.description.abstract	Visual voice activity detection (V-VAD) uses visual features to predict whether a person is speaking or not. VVAD is useful whenever audio VAD (A-VAD) is inefficient either because the acoustic signal is difficult to analyze or because it is simply missing. We propose two deep architectures for V-VAD, one based on facial landmarks and one based on optical flow. Moreover, available datasets, used for learning and for testing VVAD, lack content variability. We introduce a novel methodology to automatically create and annotate very large datasets inthe- wild – WildVVAD – based on combining A-VAD with face detection and tracking. A thorough empirical evaluation shows the advantage of training the proposed deep V-VAD models with this dataset.	es_ES
dc.description.sponsorship	European Commission 871245 SPRING	es_ES
dc.description.sponsorship	Multidisciplinary Institute in Artificial Intelligence (MIAI) ANR-19-P3IA-0003	es_ES
dc.language.iso	eng	es_ES
dc.publisher	IEEE	es_ES
dc.rights	Atribución-NoComercial-SinDerivadas 3.0 España	*
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/3.0/es/	*
dc.title	Learning Visual Voice Activity Detection with an Automatically Annotated Dataset	es_ES
dc.type	conference output	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/EC/H2020/871245	es_ES
dc.rights.accessRights	open access	es_ES
dc.type.hasVersion	SMUR	es_ES

Fichier(s) constituant ce document

Nom:: Learning_Visual_Voice_Activity ...
Taille:: 6.407Mo
Format:: PDF

Ce document figure dans la(les) collection(s) suivante(s)

OpenAIRE (Open Access Infrastructure for Research in Europe)
Publicaciones financiadas por Framework Programme 7, Horizonte 2020, Horizonte Europa... del European Research Council de la Unión Europea en el marco del Proyecto OpenAIRE que promueve el acceso abierto a Europa.

Afficher la notice abrégée

Excepté là où spécifié autrement, la license de ce document est décrite en tant que Atribución-NoComercial-SinDerivadas 3.0 España