论文信息 - One shot learning gesture recognition from RGBD images

One shot learning gesture recognition from RGBD images

We present a system to classify the gesture from only one learning example. The inputs are duo-modality, i.e. RGB and depth sensor from Kinect. Our system performs morphological denoising on depth images and automatically segments the temporal boundaries. Features are extracted based on Extended-Motion-History-Image (Extended-MHI) and the Multi-view Spectral Embedding (MSE) algorithm is used to fuse duo modalities in a physically meaningful manner. Our approach achieves less than 0.3 in Levenshtein distance in CHALEARN Gesture Challenge validation batches [1].

[1] N. Otsu. A threshold selection method from gray level histograms , 1979 .

[2] Bernhard E. Boser,et al. A training algorithm for optimal margin classifiers , 1992, COLT '92.

[3] James W. Davis,et al. The Recognition of Human Movement Using Temporal Templates , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[4] Ivan Laptev,et al. On Space-Time Interest Points , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[5] Leo Breiman,et al. Random Forests , 2001, Machine Learning.

[6] Andrew Zisserman,et al. Minimal Training, Large Lexicon, Unconstrained Sign Language Recognition , 2004, BMVC.

[7] Barbara Caputo,et al. Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[8] Thomas Serre,et al. Object recognition with features inspired by visual cortex , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[9] Karl-Friedrich Kraiss,et al. Robust Person-Independent Visual Sign Language Recognition , 2005, IbPRIA.

[10] Serge J. Belongie,et al. Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[11] Bill Triggs,et al. Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[12] Rémi Ronfard,et al. Automatic Discovery of Action Taxonomies from Multiple Views , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[13] Ramakant Nevatia,et al. Single View Human Action Recognition using Key Pose Matching and Viterbi Path Searching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[14] Ulrike von Luxburg,et al. A tutorial on spectral clustering , 2007, Stat. Comput..

[15] Cordelia Schmid,et al. Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[16] Cordelia Schmid,et al. A Spatio-Temporal Descriptor Based on 3D-Gradients , 2008, BMVC.

[17] Stan Sclaroff,et al. The American Sign Language Lexicon Video Dataset , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[18] Xinghua Sun,et al. Action recognition via local descriptors and holistic features , 2009, 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[19] Jake K. Aggarwal,et al. Stochastic Representation and Recognition of High-Level Group Activities , 2011, International Journal of Computer Vision.

[20] Cordelia Schmid,et al. Evaluation of Local Spatio-temporal Features for Action Recognition , 2009, BMVC.

[21] Cor J. Veenman,et al. Visual Word Ambiguity , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22] Yongdong Zhang,et al. Multiview Spectral Embedding , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[23] Chih-Jen Lin,et al. LIBSVM: A library for support vector machines , 2011, TIST.

[24] Sangyoun Lee,et al. 3D hand tracking using Kalman filter in depth space , 2012, EURASIP J. Adv. Signal Process..

[25] Ling Shao,et al. Silhouette Analysis-Based Action Recognition Via Exploiting Human Poses , 2013, IEEE Transactions on Circuits and Systems for Video Technology.

[26] Andrew W. Fitzgibbon,et al. Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.