Locally time-invariant models of human activities using trajectories on the grassmannian

Human activity analysis is an important problem in computer vision with applications in surveillance and summarization and indexing of consumer content. Complex human activities are characterized by non-linear dynamics that make learning, inference and recognition hard. In this paper, we consider the problem of modeling and recognizing complex activities which exhibit time-varying dynamics. To this end, we describe activities as outputs of linear dynamic systems (LDS) whose parameters vary with time, or a time-varying linear dynamic system (TV-LDS). We discuss parameter estimation methods for this class of models by assuming that the parameters are locally time-invariant. Then, we represent the space of LDS models as a Grassmann manifold. Then, the TV-LDS model is defined as a trajectory on the Grassmann manifold. We show how trajectories on the Grassmannian can be characterized using appropriate distance metrics and statistical methods that reflect the underlying geometry of the manifold. This results in more expressive and powerful models for complex human activities. We demonstrate the strength of the framework for activity-based summarization of long videos and recognition of complex human actions on two datasets.

[1]  Junji Yamato,et al.  Recognizing human action in time-sequential images using hidden Markov model , 1992, Proceedings 1992 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[2]  Alex Pentland,et al.  Real-time American Sign Language recognition from video using hidden Markov models , 1995 .

[3]  Lawton Hubert Lee,et al.  Identification and Robust Control of Linear Parameter-Varying Systems , 1997 .

[4]  Bart De Moor,et al.  Subspace algorithms for the stochastic identification problem, , 1993, Autom..

[5]  Pietro Perona,et al.  Human action recognition by sequence of movelet codewords , 2002, Proceedings. First International Symposium on 3D Data Processing Visualization and Transmission.

[6]  Y. Chikuse Statistics on special manifolds , 2003 .

[7]  Georgios B. Giannakis,et al.  Subspace methods for blind estimation of time-varying FIR channels , 1997, IEEE Trans. Signal Process..

[8]  Yang Wang,et al.  Unsupervised Discovery of Action Classes , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[9]  Jianbo Shi,et al.  Detecting unusual activity in video , 2004, CVPR 2004.

[10]  Trevor Darrell,et al.  Latent-Dynamic Discriminative Models for Continuous Gesture Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  James M. Rehg,et al.  Learning and Inferring Motion Patterns using Parametric Segmental Switching Linear Dynamic Systems , 2008, International Journal of Computer Vision.

[12]  P. Absil,et al.  Riemannian Geometry of Grassmann Manifolds with a View on Algorithmic Computation , 2004 .

[13]  Alan Edelman,et al.  The Geometry of Algorithms with Orthogonality Constraints , 1998, SIAM J. Matrix Anal. Appl..

[14]  Payam Saisan,et al.  Dynamic texture recognition , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[15]  Cristian Sminchisescu,et al.  Conditional Random Fields for Contextual Human Motion Recognition , 2005, ICCV.

[16]  Rama Chellappa,et al.  Statistical analysis on Stiefel and Grassmann manifolds with applications in computer vision , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Vladimir Pavlovic,et al.  Impact of dynamic model learning on classification of human motion , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[18]  E. Klassen Bayesian, Geometric Subspace Tracking , 2002 .

[19]  Lennart Ljung,et al.  System Identification: Theory for the User , 1987 .

[20]  Michel Verhaegen,et al.  Subspace identification of multivariable linear parameter-varying systems , 2002, Autom..

[21]  Ashok Veeraraghavan,et al.  The Function Space of an Activity , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[22]  A. Willsky,et al.  Time-varying parametric modeling of speech☆ , 1983 .

[23]  T. Claasen,et al.  On stationary linear time-varying systems , 1982 .

[24]  T. Rao The Fitting of Non-stationary Time-series Models with Time-dependent Parameters , 1970 .

[25]  Michael Isard,et al.  Learning and Classification of Complex Dynamics , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[26]  Dimitris N. Metaxas,et al.  ASL recognition based on a coupling between HMMs and 3D motion analysis , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[27]  Mario Sznaier,et al.  A model (in)validation approach to gait classification , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.