Event-based Vision for Early Prediction of Manipulation Actions
Metadatos
Mostrar el registro completo del ítemAutor
Déniz Cerpa, José Daniel; Fermüller, Cornelia; Ros Vidal, Eduardo; Rodríguez Álvarez, Manuel; Barranco Expósito, FranciscoEditorial
Cornell University
Materia
Event-based vision Online prediction Manipulation action prediction
Fecha
2023-07-26Referencia bibliográfica
Daniel Deniz, Cornelia Fermuller, Eduardo Ros, Manuel Rodriguez-Alvarez, Francisco Barranco. Event-based Vision for Early Prediction of Manipulation Actions. DOI: https://doi.org/10.48550/arXiv.2307.14332
Patrocinador
Spanish National Grant PID2019-109434RA-I00/ SRA; NSF OISE 2020624Resumen
Neuromorphic visual sensors are artificial retinas that output sequences of asynchronous events
when brightness changes occur in the scene. These sensors offer many advantages including very
high temporal resolution, no motion blur and smart data compression ideal for real-time processing.
In this study, we introduce an event-based dataset on fine-grained manipulation actions and
perform an experimental study on the use of transformers for action prediction with events. There is
enormous interest in the fields of cognitive robotics and human-robot interaction on understanding
and predicting human actions as early as possible. Early prediction allows anticipating complex
stages for planning, enabling effective and real-time interaction. Our Transformer network uses
events to predict manipulation actions as they occur, using online inference. The model succeeds
at predicting actions early on, building up confidence over time and achieving state-of-the-art classification.
Moreover, the attention-based transformer architecture allows us to study the role of
the spatio-temporal patterns selected by the model. Our experiments show that the Transformer
network captures action dynamic features outperforming video-based approaches and succeeding
with scenarios where the differences between actions lie in very subtle cues. Finally, we release the
new event dataset, which is the first in the literature for manipulation action recognition.





