<rdf:RDF xmlns:rdf="http://www.openarchives.org/OAI/2.0/rdf/" xmlns:ow="http://www.ontoweb.org/ontology/1#" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:ds="http://dspace.org/ds/elements/1.1/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:doc="http://www.lyncode.com/xoai" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/rdf/ http://www.openarchives.org/OAI/2.0/rdf.xsd">
   <ow:Publication rdf:about="oai:digibug.ugr.es:10481/111859">
      <dc:title>Event-based Vision for Early Prediction of Manipulation Actions</dc:title>
      <dc:creator>Déniz Cerpa, José Daniel</dc:creator>
      <dc:creator>Fermüller, Cornelia</dc:creator>
      <dc:creator>Ros Vidal, Eduardo</dc:creator>
      <dc:creator>Rodríguez Álvarez, Manuel</dc:creator>
      <dc:creator>Barranco Expósito, Francisco</dc:creator>
      <dc:subject>Event-based vision</dc:subject>
      <dc:subject>Online prediction</dc:subject>
      <dc:subject>Manipulation action prediction</dc:subject>
      <dc:description>This work was supported by the Spanish National Grant PID2019-109434RA-I00/ SRA (State Research Agency&#xd;
/10.13039/501100011033). We acknowledge the Telluride Neuromorphic Cognition Engineering Workshop (http:&#xd;
//www.ine-web.org), supported by NSF grant OISE 2020624 for the fruitful discussions on neuromorphic cognition&#xd;
and their participants for helping with the recording of the dataset.</dc:description>
      <dc:description>Neuromorphic visual sensors are artificial retinas that output sequences of asynchronous events&#xd;
when brightness changes occur in the scene. These sensors offer many advantages including very&#xd;
high temporal resolution, no motion blur and smart data compression ideal for real-time processing.&#xd;
In this study, we introduce an event-based dataset on fine-grained manipulation actions and&#xd;
perform an experimental study on the use of transformers for action prediction with events. There is&#xd;
enormous interest in the fields of cognitive robotics and human-robot interaction on understanding&#xd;
and predicting human actions as early as possible. Early prediction allows anticipating complex&#xd;
stages for planning, enabling effective and real-time interaction. Our Transformer network uses&#xd;
events to predict manipulation actions as they occur, using online inference. The model succeeds&#xd;
at predicting actions early on, building up confidence over time and achieving state-of-the-art classification.&#xd;
Moreover, the attention-based transformer architecture allows us to study the role of&#xd;
the spatio-temporal patterns selected by the model. Our experiments show that the Transformer&#xd;
network captures action dynamic features outperforming video-based approaches and succeeding&#xd;
with scenarios where the differences between actions lie in very subtle cues. Finally, we release the&#xd;
new event dataset, which is the first in the literature for manipulation action recognition.</dc:description>
      <dc:date>2026-03-04T07:34:42Z</dc:date>
      <dc:date>2026-03-04T07:34:42Z</dc:date>
      <dc:date>2023-07-26</dc:date>
      <dc:type>conference output</dc:type>
      <dc:identifier>Daniel Deniz, Cornelia Fermuller, Eduardo Ros, Manuel Rodriguez-Alvarez, Francisco Barranco. Event-based Vision for Early Prediction of Manipulation Actions. DOI: https://doi.org/10.48550/arXiv.2307.14332</dc:identifier>
      <dc:identifier>https://hdl.handle.net/10481/111859</dc:identifier>
      <dc:identifier>10.48550/arXiv.2307.14332</dc:identifier>
      <dc:language>eng</dc:language>
      <dc:rights>http://creativecommons.org/licenses/by-nc-nd/4.0/</dc:rights>
      <dc:rights>open access</dc:rights>
      <dc:rights>Attribution-NonCommercial-NoDerivatives 4.0 Internacional</dc:rights>
      <dc:publisher>Cornell University</dc:publisher>
   </ow:Publication>
</rdf:RDF>