Mostrar el registro sencillo del ítem
Learning Visual Voice Activity Detection with an Automatically Annotated Dataset
dc.contributor.author | Guy, Sylvain | |
dc.contributor.author | Lathuilière, Stéphane | |
dc.contributor.author | Mesejo Santiago, Pablo | |
dc.contributor.author | Horaud, Radu | |
dc.date.accessioned | 2021-10-04T07:11:17Z | |
dc.date.available | 2021-10-04T07:11:17Z | |
dc.date.issued | 2020-10-16 | |
dc.identifier.citation | Published version: S. Guy... [et al.]. "Learning Visual Voice Activity Detection with an Automatically Annotated Dataset," 2020 25th International Conference on Pattern Recognition (ICPR), 2021, pp. 4851-4856, doi: [10.1109/ICPR48806.2021.9412884] | es_ES |
dc.identifier.uri | http://hdl.handle.net/10481/70588 | |
dc.description | This work has been funded by the EU H2020 project #871245 SPRING and by the Multidisciplinary Institute in Artificial Intelligence (MIAI) #ANR-19-P3IA-0003. | es_ES |
dc.description.abstract | Visual voice activity detection (V-VAD) uses visual features to predict whether a person is speaking or not. VVAD is useful whenever audio VAD (A-VAD) is inefficient either because the acoustic signal is difficult to analyze or because it is simply missing. We propose two deep architectures for V-VAD, one based on facial landmarks and one based on optical flow. Moreover, available datasets, used for learning and for testing VVAD, lack content variability. We introduce a novel methodology to automatically create and annotate very large datasets inthe- wild – WildVVAD – based on combining A-VAD with face detection and tracking. A thorough empirical evaluation shows the advantage of training the proposed deep V-VAD models with this dataset. | es_ES |
dc.description.sponsorship | European Commission 871245 SPRING | es_ES |
dc.description.sponsorship | Multidisciplinary Institute in Artificial Intelligence (MIAI) ANR-19-P3IA-0003 | es_ES |
dc.language.iso | eng | es_ES |
dc.publisher | IEEE | es_ES |
dc.rights | Atribución-NoComercial-SinDerivadas 3.0 España | * |
dc.rights.uri | http://creativecommons.org/licenses/by-nc-nd/3.0/es/ | * |
dc.title | Learning Visual Voice Activity Detection with an Automatically Annotated Dataset | es_ES |
dc.type | conference output | es_ES |
dc.relation.projectID | info:eu-repo/grantAgreement/EC/H2020/871245 | es_ES |
dc.rights.accessRights | open access | es_ES |
dc.type.hasVersion | SMUR | es_ES |
Ficheros en el ítem
Este ítem aparece en la(s) siguiente(s) colección(ones)
-
OpenAIRE (Open Access Infrastructure for Research in Europe)
Publicaciones financiadas por Framework Programme 7, Horizonte 2020, Horizonte Europa... del European Research Council de la Unión Europea en el marco del Proyecto OpenAIRE que promueve el acceso abierto a Europa.