Similarity Measure between Two Gestures Using Triplets

One of the dominant approaches to gesture recognition, especially when we have one or few samples per class, is to compute the time-warped distance between the two sequences and perform nearest-neighbor classification. In this work, we show that we get much better results if instead we consider the similarity of the pattern of frame-wise distances of these two sequences with a third (anchor) sequence from the modelbase. We refer to these distance pattern vectors as the warp vectors. If these warp vectors are similar, then so are the sequences, if not, they are dissimilar. At the algorithmic core we have two dynamic time warping processes, one to compute the warp vectors with the anchor sequences and the other to compare these warp vectors. We select the anchor sequence to be the one that minimizes the overall distance, i.e. the sequence with respect to which these two sequences are the most similar. We present results on a large dataset of 1500 RGBD sequences spanning 150 gesture classes, such as traffic signals, sign language, and every day actions, extracted from the ChaLearn Gesture Challenge dataset. We experimented with three different feature types: difference of frames, HOG and relational distributions. We found that there were improvements of 5%, 15%, and 7%, respectively, at 20% false alarm rate, over traditional two-sequence based timewarped distance.

[1]  Andrew Zisserman,et al.  Minimal Training, Large Lexicon, Unconstrained Sign Language Recognition , 2004, BMVC.

[2]  Stefan Carlsson,et al.  Gated classifiers: Boosting under high intra-class variation , 2011, CVPR 2011.

[3]  James C. Bezdek,et al.  Relational duals of the c-means clustering algorithms , 1989, Pattern Recognit..

[4]  Sudeep Sarkar,et al.  Statistical Motion Model Based on the Change of Feature Relationships: Human Gait-Based Recognition , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Larry S. Davis,et al.  Learning dynamics for exemplar-based gesture recognition , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[6]  Tal Hassner,et al.  One Shot Similarity Metric Learning for Action Recognition , 2011, SIMBAD.

[7]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[8]  Pietro Perona,et al.  One-shot learning of object categories , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Rémi Ronfard,et al.  A survey of vision-based methods for action representation, segmentation and recognition , 2011, Comput. Vis. Image Underst..

[10]  Heung-Il Suk,et al.  Hand gesture recognition based on dynamic Bayesian network framework , 2010, Pattern Recognit..

[11]  Ling Shao,et al.  One shot learning gesture recognition from RGBD images , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[12]  Mubarak Shah,et al.  Recognizing Hand Gestures , 1994, ECCV.

[13]  Yui Man Lui,et al.  A least squares regression framework on manifolds and its application to gesture recognition , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[14]  Thorsten Joachims,et al.  Learning a Distance Metric from Relative Comparisons , 2003, NIPS.

[15]  James C. Bezdek,et al.  Nerf c-means: Non-Euclidean relational fuzzy clustering , 1994, Pattern Recognit..

[16]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[17]  Tal Hassner,et al.  The One-Shot similarity kernel , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[18]  Todd Ingalls,et al.  Real-time Gesture Recognition with Minimal Training Requirements and On-line Learning , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.