Conditional distance based matching for one-shot gesture recognition

A problem of matching gestures, where there are one or few samples per class, is considered in this paper. The proposed approach shows that much better results are achieved if the distance between the pattern of frame-wise distances of two gesture sequences with a third (anchor) sequence from the modelbase is considered. Such a measure is called as conditional distance and these distance pattern are referred to as "warp vectors". If these warp vectors are similar, then so are the sequences; if not, they are dissimilar. At the algorithmic core, there are two dynamic time warping processes, one to compute the warp vectors with the anchor sequences and the other to compare these warp vectors. In order to reduce the complexity a speedup strategy is proposed by pre-selecting "good" anchor sequences. Conditional distance is used for individual and sentence level gesture matching. Both single and multiple subject datasets are used. Experiments show improved performance above 82% spanning 179 classes. HighlightsWe propose a new distance measure called conditional distance between two gestures sequences when we have only one or a few samples per gesture class.Conditional distance is the distance between query and model gesture sequences in the presence of a third (anchor) gesture sequence.We propose speedup strategy for computing conditional distances by pre-selecting the anchor.We also propose a condition distance based simultaneous gesture segmentation and recognition called conditional level building.We show results of 82% on a multiple subject dataset spanning 179 classes.

[1]  Francisco Casacuberta,et al.  Is the DTW "distance" really a metric? An algorithm reducing the number of DTW comparisons in isolated word recognition , 1985, Speech Commun..

[2]  Daijin Kim,et al.  Simultaneous Gesture Segmentation and Recognition based on Forward Spotting Accumulative HMMs , 2006, ICPR.

[3]  Eamonn J. Keogh,et al.  Experimental comparison of representation methods and distance measures for time series data , 2010, Data Mining and Knowledge Discovery.

[4]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[5]  Todd Ingalls,et al.  Real-time Gesture Recognition with Minimal Training Requirements and On-line Learning , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Ruiduo Yang,et al.  Handling Movement Epenthesis and Hand Segmentation Ambiguities in Continuous Sign Language Recognition Using Nested Dynamic Programming , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Rémi Ronfard,et al.  A survey of vision-based methods for action representation, segmentation and recognition , 2011, Comput. Vis. Image Underst..

[8]  Francisco Casacuberta,et al.  On the metric properties of dynamic time warping , 1987, IEEE Trans. Acoust. Speech Signal Process..

[9]  Eamonn Keogh Exact Indexing of Dynamic Time Warping , 2002, VLDB.

[10]  Venu Govindaraju,et al.  A temporal Bayesian model for classifying, detecting and localizing activities in video sequences , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[11]  Ling Shao,et al.  One shot learning gesture recognition from RGBD images , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[12]  Sudeep Sarkar,et al.  Similarity Measure between Two Gestures Using Triplets , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[13]  Hafiz Imtiaz,et al.  A template matching approach of one-shot-learning gesture recognition , 2013, Pattern Recognit. Lett..

[14]  Heung-Il Suk,et al.  Hand gesture recognition based on dynamic Bayesian network framework , 2010, Pattern Recognit..

[15]  Thorsten Joachims,et al.  Learning a Distance Metric from Relative Comparisons , 2003, NIPS.

[16]  Wei Li,et al.  One-shot learning gesture recognition from RGB-D data using bag of features , 2013, J. Mach. Learn. Res..

[17]  Surendra Ranganath,et al.  Towards subject independent continuous sign language recognition: A segment and merge approach , 2014, Pattern Recognit..

[18]  S. Chiba,et al.  Dynamic programming algorithm optimization for spoken word recognition , 1978 .

[19]  Javid Taheri,et al.  SparseDTW: A Novel Approach to Speed up Dynamic Time Warping , 2009, AusDM.

[20]  Hermann Ney,et al.  Dynamic programming search for continuous speech recognition , 1999, IEEE Signal Process. Mag..

[21]  Hong Li,et al.  Model-based segmentation and recognition of dynamic gestures in continuous video streams , 2011, Pattern Recognit..

[22]  Daniel Lemire,et al.  Faster retrieval with a two-pass dynamic-time-warping lower bound , 2008, Pattern Recognit..

[23]  Tal Hassner,et al.  One Shot Similarity Metric Learning for Action Recognition , 2011, SIMBAD.

[24]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[25]  James C. Bezdek,et al.  Nerf c-means: Non-Euclidean relational fuzzy clustering , 1994, Pattern Recognit..

[26]  Francisco Casacuberta,et al.  On the verification of triangle inequality by dynamic time-warping dissimilarity measures , 1988, Speech Commun..

[27]  Philip Chan,et al.  Toward accurate dynamic time warping in linear time and space , 2007, Intell. Data Anal..

[28]  Yui Man Lui,et al.  A least squares regression framework on manifolds and its application to gesture recognition , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[29]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[30]  Tal Hassner,et al.  The One-Shot similarity kernel , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[31]  Thomas S. Huang,et al.  Gesture modeling and recognition using finite state machines , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[32]  Pietro Perona,et al.  One-shot learning of object categories , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Larry S. Davis,et al.  Learning dynamics for exemplar-based gesture recognition , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[34]  C. Myers,et al.  A level building dynamic time warping algorithm for connected word recognition , 1981 .

[35]  Stefan Carlsson,et al.  Gated classifiers: Boosting under high intra-class variation , 2011, CVPR 2011.

[36]  James C. Bezdek,et al.  Relational duals of the c-means clustering algorithms , 1989, Pattern Recognit..

[37]  Larry S. Davis,et al.  Gesture recognition using a probabilistic framework for pose matching , 2002, 7th International Conference on Control, Automation, Robotics and Vision, 2002. ICARCV 2002..

[38]  F. Florez,et al.  Hand gesture recognition following the dynamics of a topology-preserving network , 2002, Proceedings of Fifth IEEE International Conference on Automatic Face Gesture Recognition.

[39]  Isabelle Guyon,et al.  Results and Analysis of the ChaLearn Gesture Challenge 2012 , 2012, WDIA.

[40]  Cristian Sminchisescu,et al.  Conditional Random Fields for Contextual Human Motion Recognition , 2005, ICCV.

[41]  Sudeep Sarkar,et al.  Statistical Motion Model Based on the Change of Feature Relationships: Human Gait-Based Recognition , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[42]  Paul J. Besl,et al.  A Method for Registration of 3-D Shapes , 1992, IEEE Trans. Pattern Anal. Mach. Intell..