A Low Power, Fully Event-Based Gesture Recognition System

We present the first gesture recognition system implemented end-to-end on event-based hardware, using a TrueNorth neurosynaptic processor to recognize hand gestures in real-time at low power from events streamed live by a Dynamic Vision Sensor (DVS). The biologically inspired DVS transmits data only when a pixel detects a change, unlike traditional frame-based cameras which sample every pixel at a fixed frame rate. This sparse, asynchronous data representation lets event-based cameras operate at much lower power than frame-based cameras. However, much of the energy efficiency is lost if, as in previous work, the event stream is interpreted by conventional synchronous processors. Here, for the first time, we process a live DVS event stream using TrueNorth, a natively event-based processor with 1 million spiking neurons. Configured here as a convolutional neural network (CNN), the TrueNorth chip identifies the onset of a gesture with a latency of 105 ms while consuming less than 200 mW. The CNN achieves 96.5% out-of-sample accuracy on a newly collected DVS dataset (DvsGesture) comprising 11 hand gesture categories from 29 subjects under 3 illumination conditions.

[1]  Cordelia Schmid,et al.  Evaluation of Local Spatio-temporal Features for Action Recognition , 2009, BMVC.

[2]  Cordelia Schmid,et al.  A Spatio-Temporal Descriptor Based on 3D-Gradients , 2008, BMVC.

[3]  Christian Wolf,et al.  Multi-scale Deep Learning for Gesture Detection and Localization , 2014, ECCV Workshops.

[4]  Jock D. Mackinlay,et al.  The information visualizer, an information workspace , 1991, CHI.

[5]  Serge J. Belongie,et al.  Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[6]  Luc Van Gool,et al.  An Efficient Dense and Scale-Invariant Spatio-Temporal Interest Point Detector , 2008, ECCV.

[7]  Mohan M. Trivedi,et al.  Hand Gesture Recognition in Real Time for Automotive Interfaces: A Multimodal Vision-Based Approach and Evaluations , 2014, IEEE Transactions on Intelligent Transportation Systems.

[8]  Christian Wolf,et al.  ModDrop: Adaptive Multi-Modal Gesture Recognition , 2014, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Spencer D. Kelly,et al.  Gesture Gives a Hand to Language and Learning: Perspectives from Cognitive Neuroscience, Developmental Psychology and Education , 2008, Lang. Linguistics Compass.

[10]  Tobi Delbrück,et al.  Real-Time Gesture Interface Based on Event-Driven Processing From Stereo Silicon Retinas , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[11]  Bernabé Linares-Barranco,et al.  ConvNets experiments on SpiNNaker , 2015, 2015 IEEE International Symposium on Circuits and Systems (ISCAS).

[12]  Qing Wu,et al.  Binary image classification using a neurosynaptic processor: A trade-off analysis , 2016, 2016 IEEE International Symposium on Circuits and Systems (ISCAS).

[13]  Garrick Orchard,et al.  Benchmarking neuromorphic vision: lessons learnt from computer vision , 2015, Front. Neurosci..

[14]  Tobi Delbrück,et al.  A 128 X 128 120db 30mw asynchronous vision sensor that responds to relative intensity change , 2006, 2006 IEEE International Solid State Circuits Conference - Digest of Technical Papers.

[15]  Elena Mugellini,et al.  A Survey of Datasets for Human Gesture Recognition , 2014, HCI.

[16]  Andrew S. Cassidy,et al.  A million spiking-neuron integrated circuit with a scalable communication network and interface , 2014, Science.

[17]  John Tran,et al.  cuDNN: Efficient Primitives for Deep Learning , 2014, ArXiv.

[18]  Mubarak Shah,et al.  A 3-dimensional sift descriptor and its application to action recognition , 2007, ACM Multimedia.

[19]  Pavlo Molchanov,et al.  Online Detection and Classification of Dynamic Hand Gestures with Recurrent 3D Convolutional Neural Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Cordelia Schmid,et al.  Dense Trajectories and Motion Boundary Descriptors for Action Recognition , 2013, International Journal of Computer Vision.

[21]  Brejesh Lall,et al.  Low power gesture recognition system using skin statistics for consumer applications , 2016, 2016 IEEE International Conference on Consumer Electronics-Taiwan (ICCE-TW).

[22]  Karthik Ramani,et al.  DeepHand: Robust Hand Pose Estimation by Completing a Matrix Imputed with Deep Features , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Steve Furber,et al.  Large-scale neuromorphic computing systems , 2016, Journal of neural engineering.

[24]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[25]  Ivan Laptev,et al.  On Space-Time Interest Points , 2005, International Journal of Computer Vision.

[26]  Ling Shao,et al.  Deep Dynamic Neural Networks for Multimodal Gesture Segmentation and Recognition , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Dandu Amarnatha Reddy Vision Based Hand Gesture Recognition for Human Computer Interaction , 2018 .

[28]  Andrea Vedaldi,et al.  MatConvNet: Convolutional Neural Networks for MATLAB , 2014, ACM Multimedia.

[29]  Martin Saerbeck,et al.  Recent methods and databases in vision-based hand gesture recognition: A review , 2015, Comput. Vis. Image Underst..

[30]  Tobi Delbruck,et al.  A 240 × 180 130 dB 3 µs Latency Global Shutter Spatiotemporal Vision Sensor , 2014, IEEE Journal of Solid-State Circuits.

[31]  Luca Maria Gambardella,et al.  Max-pooling convolutional neural networks for vision-based hand gesture recognition , 2011, 2011 IEEE International Conference on Signal and Image Processing Applications (ICSIPA).

[32]  Ming Yang,et al.  3D Convolutional Neural Networks for Human Action Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Robert B. Miller,et al.  Response time in man-computer conversational transactions , 1899, AFIPS Fall Joint Computing Conference.

[34]  Hermann Ney,et al.  Deep Hand: How to Train a CNN on 1 Million Hand Images When Your Data is Continuous and Weakly Labelled , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Daniel Matolin,et al.  A QVGA 143 dB Dynamic Range Frame-Free PWM Image Sensor With Lossless Pixel-Level Video Compression and Time-Domain CDS , 2011, IEEE Journal of Solid-State Circuits.

[36]  Zicheng Liu,et al.  HON4D: Histogram of Oriented 4D Normals for Activity Recognition from Depth Sequences , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[37]  Lu Yang,et al.  Survey on 3D Hand Gesture Recognition , 2016, IEEE Transactions on Circuits and Systems for Video Technology.

[38]  S. Mitra,et al.  Gesture Recognition: A Survey , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[39]  Pavlo Molchanov,et al.  Hand gesture recognition with 3D convolutional neural networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[40]  Andrew Zisserman,et al.  Convolutional Two-Stream Network Fusion for Video Action Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Razvan Pascanu,et al.  Combining modality specific deep neural networks for emotion recognition in video , 2013, ICMI '13.

[42]  Tobi Delbrück,et al.  A 128$\times$ 128 120 dB 15 $\mu$s Latency Asynchronous Temporal Contrast Vision Sensor , 2008, IEEE Journal of Solid-State Circuits.

[43]  Andrew S. Cassidy,et al.  Cognitive computing building block: A versatile and efficient digital neuron model for neurosynaptic cores , 2013, The 2013 International Joint Conference on Neural Networks (IJCNN).

[44]  Hongjie Liu,et al.  DVS Benchmark Datasets for Object Tracking, Action Recognition, and Object Recognition , 2016, Front. Neurosci..

[45]  Andrew S. Cassidy,et al.  Convolutional networks for fast, energy-efficient neuromorphic computing , 2016, Proceedings of the National Academy of Sciences.

[46]  Andrew S. Cassidy,et al.  Cognitive computing programming paradigm: A Corelet Language for composing networks of neurosynaptic cores , 2013, The 2013 International Joint Conference on Neural Networks (IJCNN).