Silhouette-based gesture and action recognition via modeling trajectories on Riemannian shape manifolds

This paper addresses the problem of recognizing human gestures from videos using models that are built from the Riemannian geometry of shape spaces. We represent a human gesture as a temporal sequence of human poses, each characterized by a contour of the associated human silhouette. The shape of a contour is viewed as a point on the shape space of closed curves and, hence, each gesture is characterized and modeled as a trajectory on this shape space. We propose two approaches for modeling these trajectories. In the first template-based approach, we use dynamic time warping (DTW) to align the different trajectories using elastic geodesic distances on the shape space. The gesture templates are then calculated by averaging the aligned trajectories. In the second approach, we use a graphical model approach similar to an exemplar-based hidden Markov model, where we cluster the gesture shapes on the shape space, and build non-parametric statistical models to capture the variations within each cluster. We model each gesture as a Markov model of transitions between these clusters. To evaluate the proposed approaches, an extensive set of experiments was performed using two different data sets representing gesture and action recognition applications. The proposed approaches not only are successfully able to represent the shape and dynamics of the different classes for recognition, but are also robust against some errors resulting from segmentation and background subtraction.

[1]  K. Mardia,et al.  Statistical Shape Analysis , 1998 .

[2]  K. Mardia,et al.  Affine shape analysis and image analysis , 2003 .

[3]  Anuj Srivastava,et al.  A Novel Representation for Riemannian Analysis of Elastic Curves in Rn , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Fatih Murat Porikli,et al.  Learning on lie groups for invariant detection and tracking , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Ashok Veeraraghavan,et al.  The Function Space of an Activity , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[6]  Ramakant Nevatia,et al.  Large-scale event detection using semi-hidden Markov models , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[7]  H. Le,et al.  Locating Fréchet means with application to shape spaces , 2001, Advances in Applied Probability.

[8]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[9]  Anuj Srivastava,et al.  Shape Analysis of Elastic Curves in Euclidean Spaces , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Larry S. Davis,et al.  Recognizing actions by shape-motion prototype trees , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[11]  Tieniu Tan,et al.  A survey on visual surveillance of object motion and behaviors , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[12]  Rémi Ronfard,et al.  Action Recognition from Arbitrary Views using 3D Exemplars , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[13]  Jake K. Aggarwal,et al.  Human motion: modeling and recognition of actions and interactions , 2004, Proceedings. 2nd International Symposium on 3D Data Processing, Visualization and Transmission, 2004. 3DPVT 2004..

[14]  Michael I. Miller,et al.  Hilbert-Schmidt Lower Bounds for Estimators on Matrix Lie Groups for ATR , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  Rama Chellappa,et al.  Statistical analysis on Stiefel and Grassmann manifolds with applications in computer vision , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Anuj Srivastava,et al.  Monte Carlo extrinsic estimators of manifold-valued parameters , 2002, IEEE Trans. Signal Process..

[17]  Junji Yamato,et al.  Recognizing human action in time-sequential images using hidden Markov model , 1992, Proceedings 1992 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[18]  James W. Davis,et al.  The Recognition of Human Movement Using Temporal Templates , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[19]  Jitendra Malik,et al.  Shape matching and object recognition using shape contexts , 2010, 2010 3rd International Conference on Computer Science and Information Technology.

[20]  Rama Chellappa,et al.  Role of shape and kinematics in human movement analysis , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[21]  Anuj Srivastava,et al.  Geodesics Between 3D Closed Curves Using Path-Straightening , 2006, ECCV.

[22]  Tao Zhang,et al.  Adaptive visual gesture recognition for human-robot interaction using a knowledge-based software platform , 2007, Robotics Auton. Syst..

[23]  Andrew Blake,et al.  Probabilistic Tracking with Exemplars in a Metric Space , 2002, International Journal of Computer Vision.

[24]  Larry S. Davis,et al.  Learning dynamics for exemplar-based gesture recognition , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[25]  P. Thomas Fletcher,et al.  Principal geodesic analysis for the study of nonlinear statistics of shape , 2004, IEEE Transactions on Medical Imaging.

[26]  Jitendra Malik,et al.  Recognizing action at a distance , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[27]  Dariu Gavrila,et al.  The Visual Analysis of Human Movement: A Survey , 1999, Comput. Vis. Image Underst..

[28]  Anuj Srivastava,et al.  On Shape of Plane Elastic Curves , 2007, International Journal of Computer Vision.

[29]  Michael Werman,et al.  Affine Invariance Revisited , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[30]  Seong-Whan Lee,et al.  Gesture Spotting and Recognition for Human–Robot Interaction , 2007, IEEE Transactions on Robotics.

[31]  Guillermo Sapiro,et al.  Dynamic Shapes Average , 2003 .

[32]  Alex Pentland,et al.  Coupled hidden Markov models for complex action recognition , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[33]  Ming-Kuei Hu,et al.  Visual pattern recognition by moment invariants , 1962, IRE Trans. Inf. Theory.

[34]  Timothy F. Cootes,et al.  Active Shape Models-Their Training and Application , 1995, Comput. Vis. Image Underst..

[35]  Laurent Younes,et al.  Computable Elastic Distances Between Shapes , 1998, SIAM J. Appl. Math..

[36]  Herbert Freeman,et al.  On the Encoding of Arbitrary Geometric Configurations , 1961, IRE Trans. Electron. Comput..

[37]  Anuj Srivastava,et al.  Removing Shape-Preserving Transformations in Square-Root Elastic (SRE) Framework for Shape Analysis of Curves , 2007, EMMCVPR.

[38]  K. Mardia,et al.  Projective Shape Analysis , 1999 .

[39]  Ulf Grenander,et al.  General Pattern Theory: A Mathematical Study of Regular Structures , 1993 .

[40]  Yaser Sheikh,et al.  Exploring the space of a human action , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[41]  Namrata Vaswani,et al.  Nonstationary Shape Activities: Dynamic Models for Landmark Shape Change and Applications , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42]  D. Mumford,et al.  A Metric on Shape Space with Explicit Geodesics , 2007, 0706.4299.

[43]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[44]  Fatih Murat Porikli,et al.  Pedestrian Detection via Classification on Riemannian Manifolds , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[45]  Anuj Srivastava,et al.  Analysis of planar shapes using geodesic paths on shape spaces , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46]  Rama Chellappa,et al.  Rate-Invariant Recognition of Humans and Their Activities , 2009, IEEE Transactions on Image Processing.

[47]  Michael J. Black,et al.  Parameterized Modeling and Recognition of Activities , 1999, Comput. Vis. Image Underst..

[48]  Trevor Darrell,et al.  Hidden Conditional Random Fields for Gesture Recognition , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[49]  Ralph Roskies,et al.  Fourier Descriptors for Plane Closed Curves , 1972, IEEE Transactions on Computers.

[50]  A. Elgammal,et al.  Inferring 3D body pose from silhouettes using activity manifold learning , 2004, CVPR 2004.

[51]  T. K. Carne,et al.  Shape and Shape Theory , 1999 .

[52]  I. Holopainen Riemannian Geometry , 1927, Nature.

[53]  Edmond Boyer,et al.  Action recognition using exemplar-based embedding , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[54]  Anuj Srivastava,et al.  Statistical shape analysis: clustering, learning, and testing , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[55]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[56]  L. Baum,et al.  A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains , 1970 .

[57]  Liang Wang,et al.  Learning and Matching of Dynamic Shape Manifolds for Human Action Recognition , 2007, IEEE Transactions on Image Processing.

[58]  Suresh Venkatasubramanian,et al.  Robust statistics on Riemannian manifolds via the geometric median , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[59]  Michael J. Black,et al.  Parameterized modeling and recognition of activities , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[60]  Rama Chellappa,et al.  Activity recognition using the dynamics of the configuration of interacting objects , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[61]  Adrian Hilton,et al.  A survey of advances in vision-based human motion capture and analysis , 2006, Comput. Vis. Image Underst..

[62]  Xavier Pennec,et al.  Probabilities and statistics on Riemannian manifolds: Basic tools for geometric measurements , 1999, NSIP.

[63]  Chang Hong Liu,et al.  Vision based gesture recognition for human-robot symbiosis , 2007, 2007 10th international conference on computer and information technology.

[64]  René Vidal,et al.  Clustering and dimensionality reduction on Riemannian manifolds , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[65]  Rama Chellappa,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 Matching Shape Sequences in Video with Applications in Human Movement Analysis. Ieee Transactions on Pattern Analysis and Machine Intelligence 2 , 2022 .

[66]  Christoph Bregler,et al.  Learning and recognizing human dynamics in video sequences , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[67]  Liang Wang,et al.  Recognizing Human Activities from Silhouettes: Motion Subspace and Factorial Discriminative Graphical Model , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.