Deep Convolutional and LSTM Neural Network Architectures on Leap Motion Hand Tracking Data Sequences

This paper focuses on the hand gesture recognition problem, in which input is a multidimensional time series signal acquired from a Leap Motion Sensor and output is a predefined set of gestures. In the present work, we propose the adoption of Convolutional Neural Networks (CNNs), either in combination with a Long Short-Term Memory (LSTM) neural network (i.e. CNN-LSTM), or standalone in a deep architecture (i.e. dCNN) to automate feature learning and classification from the raw input data. The learned features are considered as the higher level abstract representation of low level raw time series signals and are employed in a unified supervised learning and classification model. The proposed CNN-LSTM and deep CNN models demonstrate recognition rates of 94% on the Leap Motion Hand Gestures for Interaction with 3D Virtual Music Instruments dataset, which outperforms previously proposed models of handcrafted and automated learned features on LSTM networks.

[1]  Juan José Pantrigo,et al.  Convolutional Neural Networks and Long Short-Term Memory for skeleton-based human activity and hand gesture recognition , 2018, Pattern Recognit..

[2]  Atau Tanaka,et al.  Machine Learning of Musical Gestures , 2013, NIME.

[3]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[4]  Jim Tørresen,et al.  Deep Predictive Models in Interactive Music , 2018, ArXiv.

[5]  Tara N. Sainath,et al.  Convolutional, Long Short-Term Memory, fully connected Deep Neural Networks , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[6]  Sergio Canazza,et al.  A conceptual framework for motion based music applications , 2015, 2015 IEEE 2nd VR Workshop on Sonic Interactions for Virtual Environments (SIVE).

[7]  Anoop M. Namboodiri,et al.  Learning deep and compact models for gesture recognition , 2017, 2017 IEEE International Conference on Image Processing (ICIP).

[8]  Nicolas E. Gold,et al.  Lessons Learned in Exploring the Leap Motion(TM) Sensor for Gesture-based Instrument Design , 2014, NIME.

[9]  Franck Multon,et al.  Dynamic hand gesture recognition based on 3D pattern assembled trajectories , 2017, 2017 Seventh International Conference on Image Processing Theory, Tools and Applications (IPTA).

[10]  Marcelo M. Wanderley,et al.  Gesture Control of Sound Synthesis: Analysis and Classification of Percussion Gestures , 2010 .

[11]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[12]  Yuanchun Shi,et al.  ChinAR: Facilitating Chinese Guqin Learning through Interactive Projected Augmentation , 2015, CCHI.

[13]  Wang Xi,et al.  Deep Learning for Hand Gesture Recognition on Skeletal Data , 2018, 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018).

[14]  Pavlo Molchanov,et al.  Online Detection and Classification of Dynamic Hand Gestures with Recurrent 3D Convolutional Neural Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[16]  Louahdi Khoudour,et al.  Exploiting deep residual networks for human action recognition from skeletal data , 2018, Comput. Vis. Image Underst..