Event-based Vision for Early Prediction of Manipulation Actions

Déniz Cerpa, José Daniel; Fermüller, Cornelia; Ros Vidal, Eduardo; Rodríguez Álvarez, Manuel; Barranco Expósito, Francisco

doi:10.48550/arXiv.2307.14332

dc.contributor.author	Déniz Cerpa, José Daniel
dc.contributor.author	Fermüller, Cornelia
dc.contributor.author	Ros Vidal, Eduardo
dc.contributor.author	Rodríguez Álvarez, Manuel
dc.contributor.author	Barranco Expósito, Francisco
dc.date.accessioned	2026-03-04T07:34:42Z
dc.date.available	2026-03-04T07:34:42Z
dc.date.issued	2023-07-26
dc.identifier.citation	Daniel Deniz, Cornelia Fermuller, Eduardo Ros, Manuel Rodriguez-Alvarez, Francisco Barranco. Event-based Vision for Early Prediction of Manipulation Actions. DOI: https://doi.org/10.48550/arXiv.2307.14332	es_ES
dc.identifier.uri	https://hdl.handle.net/10481/111859
dc.description	This work was supported by the Spanish National Grant PID2019-109434RA-I00/ SRA (State Research Agency /10.13039/501100011033). We acknowledge the Telluride Neuromorphic Cognition Engineering Workshop (http: //www.ine-web.org), supported by NSF grant OISE 2020624 for the fruitful discussions on neuromorphic cognition and their participants for helping with the recording of the dataset.	es_ES
dc.description.abstract	Neuromorphic visual sensors are artificial retinas that output sequences of asynchronous events when brightness changes occur in the scene. These sensors offer many advantages including very high temporal resolution, no motion blur and smart data compression ideal for real-time processing. In this study, we introduce an event-based dataset on fine-grained manipulation actions and perform an experimental study on the use of transformers for action prediction with events. There is enormous interest in the fields of cognitive robotics and human-robot interaction on understanding and predicting human actions as early as possible. Early prediction allows anticipating complex stages for planning, enabling effective and real-time interaction. Our Transformer network uses events to predict manipulation actions as they occur, using online inference. The model succeeds at predicting actions early on, building up confidence over time and achieving state-of-the-art classification. Moreover, the attention-based transformer architecture allows us to study the role of the spatio-temporal patterns selected by the model. Our experiments show that the Transformer network captures action dynamic features outperforming video-based approaches and succeeding with scenarios where the differences between actions lie in very subtle cues. Finally, we release the new event dataset, which is the first in the literature for manipulation action recognition.	es_ES
dc.description.sponsorship	Spanish National Grant PID2019-109434RA-I00/ SRA	es_ES
dc.description.sponsorship	NSF OISE 2020624	es_ES
dc.language.iso	eng	es_ES
dc.publisher	Cornell University	es_ES
dc.rights	Attribution-NonCommercial-NoDerivatives 4.0 Internacional	*
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/4.0/	*
dc.subject	Event-based vision	es_ES
dc.subject	Online prediction	es_ES
dc.subject	Manipulation action prediction	es_ES
dc.title	Event-based Vision for Early Prediction of Manipulation Actions	es_ES
dc.type	conference output	es_ES
dc.rights.accessRights	open access	es_ES
dc.identifier.doi	10.48550/arXiv.2307.14332
dc.type.hasVersion	SMUR	es_ES

Ficheros en el ítem

Nombre:: Event-based_Vision_for_Early_P ...
Tamaño:: 4.627Mb
Formato:: PDF

Este ítem aparece en la(s) siguiente(s) colección(ones)

DICAR - Comunicaciones Congresos, Conferencias, ...

Mostrar el registro sencillo del ítem

Excepto si se señala otra cosa, la licencia del ítem se describe como Attribution-NonCommercial-NoDerivatives 4.0 Internacional