Two-Stream CNNs for Gesture-Based Verification and Identification: Learning User Style

Recently, gestures have been proposed as an alternative biometric modality to traditional biometrics such as face, fingerprint, iris and gait. As a biometric, gesture is a short body motion that contains static anatomical information and changing behavioral (dynamic) information. We consider two types of gestures: full-body gestures, such as a wave of the arms, and hand gestures, such as a subtle curl of the fingers and palm. Most prior work in this area evaluates gestures in the context of a "password," where each user has a single, chosen gesture motion. Contrary to prior work, we instead aim to learn a user's gesture "style" from a set of training gestures. We use two-stream convolutional neural networks, a form of deep learning, to learn this gesture style. First, we evaluate the generalization performance during testing of our approach against gestures or users that have not been seen during training. Then, we study the importance of dynamics by suppressing dynamic information in training and testing. We find that we are able to outperform state-of-the-art methods in identification and verification for two biometrics-oriented gesture datasets for body and in-air hand gestures.

[1]  Ce Liu,et al.  Exploring new representations and applications for motion analysis , 2009 .

[2]  Mario Fernando Montenegro Campos,et al.  Real-Time Gesture Recognition from Depth Data through Key Poses Learning and Decision Forests , 2012, 2012 25th SIBGRAPI Conference on Graphics, Patterns and Images.

[3]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[4]  Marwan Torki,et al.  Human Action Recognition Using a Temporal Hierarchy of Covariance Descriptors on 3D Joint Locations , 2013, IJCAI.

[5]  Trevor Darrell,et al.  DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[6]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Janusz Konrad,et al.  Leveraging shape and depth in user authentication from in-air hand gestures , 2015, 2015 IEEE International Conference on Image Processing (ICIP).

[8]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[9]  Zicheng Liu,et al.  HON4D: Histogram of Oriented 4D Normals for Activity Recognition from Depth Sequences , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Janusz Konrad,et al.  Towards Gesture-Based User Authentication , 2012, 2012 IEEE Ninth International Conference on Advanced Video and Signal-Based Surveillance.

[11]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[12]  Janusz Konrad,et al.  The value of posture, build and dynamics in gesture-based user authentication , 2014, IEEE International Joint Conference on Biometrics.

[13]  Z. Liu,et al.  A real time system for dynamic hand gesture recognition with a depth sensor , 2012, 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO).

[14]  Andrew Zisserman,et al.  Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.

[15]  Junsong Yuan,et al.  Robust hand gesture recognition based on finger-earth mover's distance with a commodity depth camera , 2011, ACM Multimedia.

[16]  Wanqing Li,et al.  Action recognition based on a bag of 3D points , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[17]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[18]  Ehud Rivlin,et al.  Person identification from action styles , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[19]  Trevor Darrell,et al.  Recognizing Image Style , 2013, BMVC.

[20]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[21]  Ying Wu,et al.  Mining actionlet ensemble for action recognition with depth cameras , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Prakash Ishwar,et al.  Silhouettes versus skeletons in gesture-based authentication with Kinect , 2014, 2014 11th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS).

[23]  Sven G. Kratz,et al.  AirAuth: evaluating in-air hand gestures for authentication , 2014, MobileHCI '14.

[24]  Arun Ross,et al.  An introduction to biometrics , 2008, ICPR 2008.

[25]  Zhe Wang,et al.  Towards Good Practices for Very Deep Two-Stream ConvNets , 2015, ArXiv.

[26]  Janusz Konrad,et al.  The Value of Multiple Viewpoints in Gesture-Based User Authentication , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[27]  Robin R. Murphy,et al.  Hand gesture recognition with depth images: A review , 2012, 2012 IEEE RO-MAN: The 21st IEEE International Symposium on Robot and Human Interactive Communication.

[28]  Jake K. Aggarwal,et al.  View invariant human action recognition using histograms of 3D joints , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.