Fusing shape and spatio-temporal features for depth-based dynamic hand gesture recognition

Hand gesture recognition has many practical applications including human-computer interfaces. Many depth-based features for dynamic hand gesture recognition task have been proposed. However the performance is still unsatisfactory due to the limitation that these features can’t efficiently capture both effective shape information and detailed variation of hands in spatial and temporal domains. In this paper, we propose a new effective descriptor, DLEH2, for depth-based dynamic hand gesture recognition which is developed based on the characteristics of dynamic hand gesture through fusing simple shape and spatio-temporal features of depth sequences. For shape information, depth motion maps (DMMs) are first employed to obtain 3D structure and shape information of hands. To enhance critical shape cues, the local texture and edge information of three DMMs for hand gesture sequences are captured using DLE descriptor. However, DMMs compress the temporal information of the depth sequences into space domain, which loses critical discrimination for temporal sequence recognition to some degree. Simple but effective spatio-temporal features, HOG2, are concatenated with DLE to compensate the temporal information loss during DMMs generation and capture the detailed spatial and temporal variation of hands. Experimental results on two public benchmark datasets, 99.10 % for MSRGesture3D dataset and 98.43 % for SKIG dataset, show that the proposed fusion scheme outperforms the state-of-the-art methods.

[1]  Yun Yang,et al.  Action recognition from depth sequences using weighted fusion of 2D and 3D auto-correlation of gradients features , 2016, Multimedia Tools and Applications.

[2]  Z. Liu,et al.  A real time system for dynamic hand gesture recognition with a depth sensor , 2012, 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO).

[3]  Trevor Darrell,et al.  Hidden Conditional Random Fields for Gesture Recognition , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[4]  Lihong Zheng,et al.  Spatio-temporal pyramid cuboid matching for action recognition using depth maps , 2015, 2015 IEEE International Conference on Image Processing (ICIP).

[5]  Gang Hua,et al.  Dynamic hand gesture recognition: An exemplar-based approach from motion divergence fields , 2012, Image Vis. Comput..

[6]  Aditya Ramamoorthy,et al.  Recognition of dynamic hand gestures , 2003, Pattern Recognit..

[7]  Yi Yang,et al.  A Probabilistic Associative Model for Segmenting Weakly Supervised Images , 2014, IEEE Transactions on Image Processing.

[8]  Arif Mahmood,et al.  Real time action recognition using histograms of depth gradients and random decision forests , 2014, IEEE Winter Conference on Applications of Computer Vision.

[9]  Pol Cirujeda,et al.  4DCov: A Nested Covariance Descriptor of Spatio-Temporal Features for Gesture Recognition in Depth Sequences , 2014, 2014 2nd International Conference on 3D Vision.

[10]  Qian Du,et al.  Local Binary Patterns and Extreme Learning Machine for Hyperspectral Imagery Classification , 2015, IEEE Transactions on Geoscience and Remote Sensing.

[11]  Ngoc Quoc Ly,et al.  Elliptical density shape model for hand gesture recognition , 2014, SoICT.

[12]  Xuelong Li,et al.  A Fine-Grained Image Categorization System by Cellet-Encoded Spatial Pyramid Modeling , 2015, IEEE Transactions on Industrial Electronics.

[13]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[14]  Francisco Sandoval Hernández,et al.  Fast gesture recognition based on a two-level representation , 2009, Pattern Recognit. Lett..

[15]  Yifeng He,et al.  Human action recognition using temporal hierarchical pyramid of depth motion map and KECA , 2015, 2015 IEEE 17th International Workshop on Multimedia Signal Processing (MMSP).

[16]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[17]  Yi Yang,et al.  Discovering Discriminative Graphlets for Aerial Image Categories Recognition , 2013, IEEE Transactions on Image Processing.

[18]  Diego G. S. Santos,et al.  HAGR-D: A Novel Approach for Gesture Recognition with Depth Maps , 2015, Sensors.

[19]  Nasser Kehtarnavaz,et al.  Action Recognition from Depth Sequences Using Depth Motion Maps-Based Local Binary Patterns , 2015, 2015 IEEE Winter Conference on Applications of Computer Vision.

[20]  Xiaodong Yang,et al.  Recognizing actions using depth motion maps-based histograms of oriented gradients , 2012, ACM Multimedia.

[21]  Eshed Ohn-Bar,et al.  Joint Angles Similiarities and HOG 2 for Action Recognition , 2013 .

[22]  Seong-Whan Lee,et al.  Recognizing hand gestures using dynamic Bayesian network , 2008, 2008 8th IEEE International Conference on Automatic Face & Gesture Recognition.

[23]  Mohan M. Trivedi,et al.  Joint Angles Similarities and HOG2 for Action Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[24]  Ling Shao,et al.  Learning Discriminative Representations from RGB-D Video Data , 2013, IJCAI.

[25]  Ying Wu,et al.  Robust 3D Action Recognition with Random Occupancy Patterns , 2012, ECCV.

[26]  Hideki Nakayama,et al.  Multimodal Gesture Recognition Using Multi-stream Recurrent Neural Network , 2015, PSIVT.

[27]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[28]  Yue Gao,et al.  Representative Discovery of Structure Cues for Weakly-Supervised Image Segmentation , 2014, IEEE Transactions on Multimedia.

[29]  Ngoc Quoc Ly,et al.  Sparse spatio-temporal representation of joint shape-motion cues for human action recognition in depth sequences , 2013, The 2013 RIVF International Conference on Computing & Communication Technologies - Research, Innovation, and Vision for Future (RIVF).

[30]  Yingli Tian,et al.  Edge Enhanced Depth Motion Map for Dynamic Hand Gesture Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[31]  Jinwen Ma,et al.  Real-Time Human Action Recognition Using DMMs-Based LBP and EOH Features , 2015, ICIC.

[32]  Thomas S. Huang,et al.  Gesture modeling and recognition using finite state machines , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[33]  Hong Liu,et al.  Depth Context: a new descriptor for human activity recognition by using sole depth sequences , 2016, Neurocomputing.

[34]  Zicheng Liu,et al.  HON4D: Histogram of Oriented 4D Normals for Activity Recognition from Depth Sequences , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Nasser Kehtarnavaz,et al.  Real-time human action recognition based on depth motion maps , 2013, Journal of Real-Time Image Processing.

[36]  Hyunsoek Choi,et al.  A hierarchical structure for gesture recognition using RGB-D sensor , 2014, HAI.

[37]  Guodong Guo,et al.  Fusing Multiple Features for Depth-Based Action Recognition , 2015, ACM Trans. Intell. Syst. Technol..

[38]  Jinwen Ma,et al.  DMMs-Based Multiple Features Fusion for Human Action Recognition , 2015, Int. J. Multim. Data Eng. Manag..

[39]  Xiaodong Yang,et al.  Super Normal Vector for Activity Recognition Using Depth Sequences , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[40]  Xiaoyan Wang,et al.  Hidden-Markov-Models-Based Dynamic Hand Gesture Recognition , 2012 .

[41]  Matti Pietikäinen,et al.  Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns , 2002, IEEE Trans. Pattern Anal. Mach. Intell..