Few-Shot User-Definable Radar-Based Hand Gesture Recognition at the Edge
MetadataShow full item record
Artificial neural networksEdge computingFMCWIntel neural compute stickKnowledge transferMeta learningHuman computer interactionRadarVariational autoencoder
G. Mauro... [et al.]. "Few-Shot User-Definable Radar-Based Hand Gesture Recognition at the Edge," in IEEE Access, vol. 10, pp. 29741-29759, 2022, doi: [10.1109/ACCESS.2022.3155124]
SponsorshipFederal Ministry of Education & Research (BMBF) 19006; Austrian Research Promotion Agency (FFG); Rijksdienst voor Ondernemend Nederland (Rvo); Innovation Fund Denmark (IFD)
Technological advances and scalability are leading Human-Computer Interaction (HCI) to evolve towards intuitive forms, such as through gesture recognition. Among the various interaction strategies, radar-based recognition is emerging as a touchless, privacy-secure, and versatile solution in different environmental conditions. Classical radar-based gesture HCI solutions involve deep learning but require training on large and varied datasets to achieve robust prediction. Innovative self-learning algorithms can help tackling this problem by recognizing patterns and adapt from similar contexts. Yet, such approaches are often computationally expensive and hardly integrable into hardware-constrained solutions. In this paper, we present a gesture recognition algorithm which is easily adaptable to new users and contexts. We exploit an optimization-based meta-learning approach to enable gesture recognition in learning sequences. This method targets at learning the best possible initialization of the model parameters, simplifying training on new contexts when small amounts of data are available. The reduction in computational cost is achieved by processing the radar sensed data of gestures in the form of time maps, to minimize the input data size. This approach enables the adaptation of simple convolutional neural network (CNN) to new hand poses, thus easing the integration of the model into a hardware-constrained platform. Moreover, the use of a Variational Autoencoders (VAE) to reduce the gestures' dimensionality leads to a model size decrease of an order of magnitude and to half of the required adaptation time. The proposed framework, deployed on the Intel(R) Neural Compute Stick 2 (NCS 2), leads to an average accuracy of around 84% for unseen gestures when only one example per class is utilized at training time. The accuracy increases up to 92.6% and 94.2% when three and five samples per class are used.