Trajectory Classification Using Switched Dynamical Hidden Markov Models

This paper proposes an approach for recognizing human activities (more specifically, pedestrian trajectories) in video sequences, in a surveillance context. A system for automatic processing of video information for surveillance purposes should be capable of detecting, recognizing, and collecting statistics of human activity, reducing human intervention as much as possible. In the method described in this paper, human trajectories are modeled as a concatenation of segments produced by a set of low level dynamical models. These low level models are estimated in an unsupervised fashion, based on a finite mixture formulation, using the expectation-maximization (EM) algorithm; the number of models is automatically obtained using a minimum message length (MML) criterion. This leads to a parsimonious set of models tuned to the complexity of the scene. We describe the switching among the low-level dynamic models by a hidden Markov chain; thus, the complete model is termed a switched dynamical hidden Markov model (SD-HMM). The performance of the proposed method is illustrated with real data from two different scenarios: a shopping center and a university campus. A set of human activities in both scenarios is successfully recognized by the proposed system. These experiments show the ability of our approach to properly describe trajectories with sudden changes.

[1]  G. Celeux,et al.  Assessing a Mixture Model for Clustering with the Integrated Classification Likelihood , 1998 .

[2]  Jake K. Aggarwal,et al.  Human Motion Analysis: A Review , 1999, Comput. Vis. Image Underst..

[3]  Florence Muri,et al.  Comparaison d'algorithmes d'identification de chaînes de Markov cachées et application a la détection de régions homogènes dans les séquences d'ADN , 1997 .

[4]  Mário A. T. Figueiredo,et al.  A sequential pruning strategy for the selection of the number of states in hidden Markov models , 2003, Pattern Recognit. Lett..

[5]  Lily Lee,et al.  Monitoring Activities from Multiple Video Streams: Establishing a Common Coordinate Frame , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Alex Pentland,et al.  Pfinder: Real-Time Tracking of the Human Body , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[8]  Stephen J. Roberts,et al.  Parametric and non-parametric unsupervised cluster analysis , 1997, Pattern Recognit..

[9]  Adrian Hilton,et al.  A survey of advances in vision-based human motion capture and analysis , 2006, Comput. Vis. Image Underst..

[10]  A. Raftery Bayesian Model Selection in Social Research , 1995 .

[11]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[12]  Svetha Venkatesh,et al.  Learning Hierarchical Hidden Markov Models with General State Hierarchy , 2004, AAAI.

[13]  Takeo Kanade,et al.  A System for Video Surveillance and Monitoring , 2000 .

[14]  Larry S. Davis,et al.  A Robust Background Subtraction and Shadow Detection , 1999 .

[15]  Cor J. Veenman,et al.  Resolving Motion Correspondence for Densely Moving Points , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[16]  Gérard Govaert,et al.  Assessing a Mixture Model for Clustering with the Integrated Completed Likelihood , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  Hironobu Fujiyoshi,et al.  Moving target classification and tracking from real-time video , 1998, Proceedings Fourth IEEE Workshop on Applications of Computer Vision. WACV'98 (Cat. No.98EX201).

[18]  Anil K. Jain,et al.  Unsupervised Learning of Finite Mixture Models , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[19]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[20]  Shaogang Gong,et al.  Beyond Tracking: Modelling Activity and Understanding Behaviour , 2006, International Journal of Computer Vision.

[21]  Mubarak Shah,et al.  Monitoring human behavior from video taken in an office environment , 2001, Image Vis. Comput..

[22]  Sergio A. Velastin,et al.  People tracking in surveillance applications , 2006, Image Vis. Comput..

[23]  Hans-Hellmut Nagel,et al.  Model-based object tracking in monocular image sequences of road traffic scenes , 1993, International Journal of Computer 11263on.

[24]  Tieniu Tan,et al.  A survey on visual surveillance of object motion and behaviors , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[25]  Azriel Rosenfeld,et al.  Tracking Groups of People , 2000, Comput. Vis. Image Underst..

[26]  Svetha Venkatesh,et al.  Discovery of Activity Structures using the Hierarchical Hidden Markov Model , 2005, BMVC.

[27]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[28]  Alex Pentland,et al.  Pfinder: real-time tracking of the human body , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[29]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.

[30]  Ramakant Nevatia,et al.  Video-based event recognition: activity representation and probabilistic recognition methods , 2004, Comput. Vis. Image Underst..

[31]  Stephen M. Smith,et al.  ASSET-2: Real-Time Motion Segmentation and Shape Tracking , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[32]  Nicu Sebe,et al.  Facial expression recognition from video sequences: temporal and static modeling , 2003, Comput. Vis. Image Underst..

[33]  Jorma Rissanen,et al.  Stochastic Complexity in Statistical Inquiry , 1989, World Scientific Series in Computer Science.

[34]  H. Akaike,et al.  Information Theory and an Extension of the Maximum Likelihood Principle , 1973 .

[35]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[36]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[37]  Daniel P. Huttenlocher,et al.  Tracking non-rigid objects in complex scenes , 1993, 1993 (4th) International Conference on Computer Vision.

[38]  Ioannis Pavlidis,et al.  Urban surveillance systems: from the laboratory to the commercial world , 2001, Proc. IEEE.

[39]  David A. Forsyth,et al.  Tracking People by Learning Their Appearance , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[41]  Larry S. Davis,et al.  W4: Real-Time Surveillance of People and Their Activities , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[42]  Dorin Comaniciu,et al.  Kernel-Based Object Tracking , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[43]  Alex Pentland,et al.  A Bayesian Computer Vision System for Modeling Human Interactions , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[44]  James W. Davis,et al.  The Recognition of Human Movement Using Temporal Templates , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[45]  João M. Lemos,et al.  Long Term Tracking of Pedestrians with Groups and Occlusions , 2007 .

[46]  Henry A. Kautz,et al.  Learning and inferring transportation routines , 2004, Artif. Intell..

[47]  P. Deb Finite Mixture Models , 2008 .

[48]  Jorge S. Marques,et al.  Performance evaluation of object detection algorithms for video surveillance , 2006, IEEE Transactions on Multimedia.

[49]  Ramin Zabih,et al.  An Algorithm for Real-Time Tracking of Non-Rigid Objects , 1991, AAAI.

[50]  Mário A. T. Figueiredo,et al.  Independent increment processes for human motion recognition , 2008, Comput. Vis. Image Underst..

[51]  Terrance E. Boult,et al.  Into the woods: visual surveillance of noncooperative and camouflaged targets in complex outdoor settings , 2001, Proc. IEEE.

[52]  W. Eric L. Grimson,et al.  Learning Patterns of Activity Using Real-Time Tracking , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[53]  Christoph Bregler,et al.  Learning and recognizing human dynamics in video sequences , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[54]  Larry S. Davis,et al.  Human expression recognition from motion using a radial basis function network architecture , 1996, IEEE Trans. Neural Networks.

[55]  Dariu Gavrila,et al.  The Visual Analysis of Human Movement: A Survey , 1999, Comput. Vis. Image Underst..

[56]  Daniel Rowe Serrano Towards Robust Multiple-Target Tracking in Unconstrained Human-Populated Environments , 2008 .

[57]  Rama Chellappa,et al.  Machine Recognition of Human Activities: A Survey , 2008, IEEE Transactions on Circuits and Systems for Video Technology.

[58]  David C. Hogg,et al.  Learning the Distribution of Object Trajectories for Event Recognition , 1995, BMVC.

[59]  Ramakant Nevatia,et al.  Tracking multiple humans in complex situations , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[60]  Aaron F. Bobick,et al.  Recognition of Visual Activities and Interactions by Stochastic Parsing , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[61]  Isak Gath,et al.  Unsupervised Optimal Fuzzy Clustering , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[62]  Qi Tian,et al.  Periodic human motion description for sports video databases , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[63]  Mubarak Shah,et al.  A noniterative greedy algorithm for multiframe point correspondence , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.