Unsupervised learning of motion patterns using generative models

This work introduces a non-supervised algorithm for learning generative models for classification/recognition of human activities (specifically, pedestrian trajectories) with application to video surveillance. The proposed algorithm comprises two main features: (?) a set of low level dynamical models of the trajectories, estimated in unsupervised manner using the expectation-maximization (EM) algorithm and automatic model selection using the minimum message length (MML) criterion; (ii) a switching dynamical model described by an hidden Markov model (HMM) used to characterize the higher level activities. The hierarchical model with these two levels is herein denoted as switched dynamical hidden Markov model (SD-HMM). We illustrate the performance of the proposed technique for human activity recognition in a university campus.

[1]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[2]  Alex Pentland,et al.  A Bayesian Computer Vision System for Modeling Human Interactions , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Shimon Ullman,et al.  Recognizing solid objects by alignment with an image , 1990, International Journal of Computer Vision.

[4]  Tieniu Tan,et al.  A survey on visual surveillance of object motion and behaviors , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[5]  Shaogang Gong,et al.  Beyond Tracking: Modelling Activity and Understanding Behaviour , 2006, International Journal of Computer Vision.

[6]  Ramakant Nevatia,et al.  Video-based event recognition: activity representation and probabilistic recognition methods , 2004, Comput. Vis. Image Underst..

[7]  Anil K. Jain,et al.  Unsupervised Learning of Finite Mixture Models , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Gilles Celeux,et al.  A Component-Wise EM Algorithm for Mixtures , 2001, 1201.5913.

[9]  Mubarak Shah,et al.  Monitoring human behavior from video taken in an office environment , 2001, Image Vis. Comput..

[10]  Larry S. Davis,et al.  Human expression recognition from motion using a radial basis function network architecture , 1996, IEEE Trans. Neural Networks.

[11]  Henry A. Kautz,et al.  Learning and inferring transportation routines , 2004, Artif. Intell..

[12]  Terrance E. Boult,et al.  Into the woods: visual surveillance of noncooperative and camouflaged targets in complex outdoor settings , 2001, Proc. IEEE.

[13]  Aaron F. Bobick,et al.  Recognition of Visual Activities and Interactions by Stochastic Parsing , 2000, IEEE Trans. Pattern Anal. Mach. Intell..