Gesture and Action Recognition via Modeling Trajectories on Riemannian manifolds

This paper addresses the problem of recognizing human gestures from videos using models that are built from the Riemannian geometry of shape spaces. We represent a human gesture as a temporal sequence of human poses, each characterized by a contour of the associated human silhouette. The shape of a contour is viewed as a point on the shape space of closed curves and, hence, each gesture is characterized and modeled as a trajectory on this shape space. We propose two approaches for modeling these trajectories. In the first template-based approach, we use dynamic time warping (DTW) to align the different trajectories using elastic geodesic distances on the shape space. The gesture templates are then calculated by averaging the aligned trajectories. In the second approach, we use a graphical model approach similar to an exemplar-based hidden Markov model, where we cluster the gesture shapes on the shape space, and build non-parametric statistical models to capture the variations within each cluster. We model each gesture as a Markov model of transitions between these clusters. To evaluate the proposed approaches, an extensive set of experiments was performed using two different data sets representing gesture and action recognition applications. The proposed approaches not only are successfully able to represent the shape and dynamics of the different classes for recognition, but are also robust against some errors resulting from segmentation and background subtraction.

[1]  Dariu Gavrila,et al.  The Visual Analysis of Human Movement: A Survey , 1999, Comput. Vis. Image Underst..

[2]  James W. Davis,et al.  The Recognition of Human Movement Using Temporal Templates , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  K. Mardia,et al.  Projective Shape Analysis , 1999 .

[4]  Jitendra Malik,et al.  Shape matching and object recognition using shape contexts , 2010, 2010 3rd International Conference on Computer Science and Information Technology.

[5]  Michael I. Miller,et al.  Hilbert-Schmidt Lower Bounds for Estimators on Matrix Lie Groups for ATR , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  L. Baum,et al.  A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains , 1970 .

[7]  D. Mumford,et al.  A Metric on Shape Space with Explicit Geodesics , 2007, 0706.4299.

[8]  Adrian Hilton,et al.  A survey of advances in vision-based human motion capture and analysis , 2006, Comput. Vis. Image Underst..

[9]  Seong-Whan Lee,et al.  Gesture Spotting and Recognition for Human–Robot Interaction , 2007, IEEE Transactions on Robotics.

[10]  Larry S. Davis,et al.  Learning dynamics for exemplar-based gesture recognition , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[11]  Christoph Bregler,et al.  Learning and recognizing human dynamics in video sequences , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[12]  Chang Hong Liu,et al.  Vision based gesture recognition for human-robot symbiosis , 2007, 2007 10th international conference on computer and information technology.

[13]  Ramakant Nevatia,et al.  Large-scale event detection using semi-hidden Markov models , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[14]  Ming-Kuei Hu,et al.  Visual pattern recognition by moment invariants , 1962, IRE Trans. Inf. Theory.

[15]  K. Mardia,et al.  Affine shape analysis and image analysis , 2003 .

[16]  Rama Chellappa,et al.  Statistical analysis on Stiefel and Grassmann manifolds with applications in computer vision , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Rémi Ronfard,et al.  Action Recognition from Arbitrary Views using 3D Exemplars , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[18]  Alex Pentland,et al.  Coupled hidden Markov models for complex action recognition , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[19]  Andrew Blake,et al.  Probabilistic Tracking with Exemplars in a Metric Space , 2002, International Journal of Computer Vision.

[20]  Anuj Srivastava,et al.  A Novel Representation for Riemannian Analysis of Elastic Curves in Rn , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Herbert Freeman,et al.  On the Encoding of Arbitrary Geometric Configurations , 1961, IRE Trans. Electron. Comput..

[22]  Anuj Srivastava,et al.  Geodesics Between 3D Closed Curves Using Path-Straightening , 2006, ECCV.

[23]  Yaser Sheikh,et al.  Exploring the space of a human action , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[24]  Ashok Veeraraghavan,et al.  The Function Space of an Activity , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[25]  Suresh Venkatasubramanian,et al.  Robust statistics on Riemannian manifolds via the geometric median , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Liang Wang,et al.  Recognizing Human Activities from Silhouettes: Motion Subspace and Factorial Discriminative Graphical Model , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  R. Chellappa,et al.  Role of shape and kinematics in human movement analysis , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[28]  Edmond Boyer,et al.  Action recognition using exemplar-based embedding , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Liang Wang,et al.  Learning and Matching of Dynamic Shape Manifolds for Human Action Recognition , 2007, IEEE Transactions on Image Processing.

[30]  Jake K. Aggarwal,et al.  Human motion: modeling and recognition of actions and interactions , 2004, Proceedings. 2nd International Symposium on 3D Data Processing, Visualization and Transmission, 2004. 3DPVT 2004..

[31]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[32]  Anuj Srivastava,et al.  On Shape of Plane Elastic Curves , 2007, International Journal of Computer Vision.

[33]  Rama Chellappa,et al.  Rate-Invariant Recognition of Humans and Their Activities , 2009, IEEE Transactions on Image Processing.

[34]  Larry S. Davis,et al.  Recognizing actions by shape-motion prototype trees , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[35]  Guillermo Sapiro,et al.  Dynamic Shapes Average , 2003 .

[36]  Timothy F. Cootes,et al.  Active Shape Models-Their Training and Application , 1995, Comput. Vis. Image Underst..

[37]  H. Le,et al.  Locating Fréchet means with application to shape spaces , 2001, Advances in Applied Probability.

[38]  Fatih Murat Porikli,et al.  Pedestrian Detection via Classification on Riemannian Manifolds , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39]  Ralph Roskies,et al.  Fourier Descriptors for Plane Closed Curves , 1972, IEEE Transactions on Computers.

[40]  ChellappaRama,et al.  Matching Shape Sequences in Video with Applications in Human Movement Analysis , 2005 .

[41]  Namrata Vaswani,et al.  Nonstationary Shape Activities: Dynamic Models for Landmark Shape Change and Applications , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42]  Anuj Srivastava,et al.  Analysis of planar shapes using geodesic paths on shape spaces , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[44]  Michael J. Black,et al.  Parameterized Modeling and Recognition of Activities , 1999, Comput. Vis. Image Underst..

[45]  Michael Werman,et al.  Affine Invariance Revisited , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[46]  A. Elgammal,et al.  Inferring 3D body pose from silhouettes using activity manifold learning , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[47]  Anuj Srivastava,et al.  Statistical shape analysis: clustering, learning, and testing , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[48]  Tieniu Tan,et al.  A survey on visual surveillance of object motion and behaviors , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[49]  Trevor Darrell,et al.  Hidden Conditional Random Fields for Gesture Recognition , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[50]  Xavier Pennec,et al.  Probabilities and statistics on Riemannian manifolds: Basic tools for geometric measurements , 1999, NSIP.

[51]  Rama Chellappa,et al.  Activity recognition using the dynamics of the configuration of interacting objects , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[52]  Junji Yamato,et al.  Recognizing human action in time-sequential images using hidden Markov model , 1992, Proceedings 1992 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[53]  Tao Zhang,et al.  Adaptive visual gesture recognition for human-robot interaction using a knowledge-based software platform , 2007, Robotics Auton. Syst..

[54]  René Vidal,et al.  Clustering and dimensionality reduction on Riemannian manifolds , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[55]  Jitendra Malik,et al.  Recognizing action at a distance , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[56]  Anuj Srivastava,et al.  Monte Carlo extrinsic estimators of manifold-valued parameters , 2002, IEEE Trans. Signal Process..

[57]  Anuj Srivastava,et al.  Removing Shape-Preserving Transformations in Square-Root Elastic (SRE) Framework for Shape Analysis of Curves , 2007, EMMCVPR.

[58]  Fatih Murat Porikli,et al.  Learning on lie groups for invariant detection and tracking , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[59]  P. Thomas Fletcher,et al.  Principal geodesic analysis for the study of nonlinear statistics of shape , 2004, IEEE Transactions on Medical Imaging.

[60]  T. K. Carne,et al.  Shape and Shape Theory , 1999 .

[61]  Anuj Srivastava,et al.  Shape Analysis of Elastic Curves in Euclidean Spaces , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[62]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.