Learning sparse representation for dynamic gesture recogniton

His Gesture recognition is an important task for gesture-based Human Computer Interaction. A novel gesture recognition model based on sparse representation is proposed in this paper. The model mainly consists of the following four stages: firstly, the spatial-temporal interest points are detected from the video sequences; secondly, a cuboid is founded around each spatial-temporal interest point and the 3D SIFT features are extracted based on the cuboids; thirdly, we encode local 3D SIFT features within the sparse coding framework. In so doing, each local 3D SIFT is transformed to a linear combination of a few atoms in a pre-trained dictionary. Finally, we employ an max pooling strategy to get the final representation of a video and we use multi-class linear SVM to accomplish the classification task. We test our model in the video dataset made by ourselves and get a good performance.

[1]  Biing-Hwang Juang,et al.  Hidden Markov Models for Speech Recognition , 1991 .

[2]  Thomas S. Huang,et al.  Supervised translation-invariant sparse coding , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[3]  Nicolas D. Georganas,et al.  Real-Time Hand Gesture Detection and Recognition Using Bag-of-Features and Support Vector Machine Techniques , 2011, IEEE Transactions on Instrumentation and Measurement.

[4]  M. Elad,et al.  $rm K$-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation , 2006, IEEE Transactions on Signal Processing.

[5]  Junji Yamato,et al.  Recognizing human action in time-sequential images using hidden Markov model , 1992, Proceedings 1992 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[6]  Jin-Hyung Kim,et al.  An HMM-Based Threshold Model Approach for Gesture Recognition , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Cordelia Schmid,et al.  Evaluation of Interest Point Detectors , 2000, International Journal of Computer Vision.

[8]  Thomas Serre,et al.  Robust Object Recognition with Cortex-Like Mechanisms , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Christopher G. Harris,et al.  A Combined Corner and Edge Detector , 1988, Alvey Vision Conference.

[10]  Guozhong Dai,et al.  A Novel Method to Recognize Complex Dynamic Gesture by Combining HMM and FNN Models , 2007, 2007 IEEE Symposium on Computational Intelligence in Image and Signal Processing.

[11]  Patrick Pérez,et al.  View-Independent Action Recognition from Temporal Self-Similarities , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Cristian Sminchisescu,et al.  Conditional models for contextual human motion recognition , 2006, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[13]  Mubarak Shah,et al.  A 3-dimensional sift descriptor and its application to action recognition , 2007, ACM Multimedia.

[14]  A. Bruckstein,et al.  K-SVD : An Algorithm for Designing of Overcomplete Dictionaries for Sparse Representation , 2005 .

[15]  Chung-Lin Huang,et al.  Hand gesture recognition using a real-time tracking method and hidden Markov models , 2003, Image Vis. Comput..

[16]  Heung-Il Suk,et al.  Hand gesture recognition based on dynamic Bayesian network framework , 2010, Pattern Recognit..

[17]  Ivan Laptev,et al.  On Space-Time Interest Points , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[18]  Serge J. Belongie,et al.  Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[19]  Youtian Du,et al.  Recognizing Interaction Activities using Dynamic Bayesian Network , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[20]  Alex Pentland,et al.  Coupled hidden Markov models for complex action recognition , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[21]  Yihong Gong,et al.  Linear spatial pyramid matching using sparse coding for image classification , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Dimitris N. Metaxas,et al.  American sign language recognition: reducing the complexity of the task with phoneme-based modeling and parallel hidden markov models , 2003 .

[23]  Juan Carlos Niebles,et al.  Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words , 2006, BMVC.