Coupled Observation Decomposed Hidden Markov Model for Multiperson Activity Recognition

Multiperson activity recognition in videos is a challenging task, due to the complexity of interactions among multiple persons. In this paper, a new statistical model, named coupled observation decomposed hidden Markov model (CODHMM), is presented to model multiperson activities in videos. A human activity that involves multiple persons is analyzed in two levels: the individual level that describes each individual's motion details and the interaction level that expresses the shared information among multiple persons. The two levels are modeled by two hidden Markov chains that are interdependent and interact with each other. The observation in each chain at each time slice is decomposed into subobservations according to the number of features and the number of persons. For each activity to be recognized, a CODHMM is built and model parameters are learnt by a generalized expectation maximization (EM) algorithm. Given an input video that contains an unknown activity, maximum likelihood algorithms are developed to classify it into one of the learnt activity categories. Experimental results show that the CODHMM can successfully classify human activities involving multiple persons with high accuracy and low computations.

[1]  Samy Bengio,et al.  Automatic analysis of multimodal group actions in meetings , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Peng Dai,et al.  Group Interaction Analysis in Dynamic Context$^{\ast}$ , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[3]  G. Kosta,et al.  Group Behavior Recognition for Gesture Analysis , 2008, IEEE Transactions on Circuits and Systems for Video Technology.

[4]  J.-Y. Bouguet,et al.  Pyramidal implementation of the lucas kanade feature tracker , 1999 .

[5]  Sheng-Wen Shih,et al.  Learning Atomic Human Actions Using Variable-Length Markov Models , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[6]  Dmitry B. Goldgof,et al.  Understanding Transit Scenes: A Survey on Human Behavior-Recognition Algorithms , 2010, IEEE Transactions on Intelligent Transportation Systems.

[7]  Rama Chellappa,et al.  Machine Recognition of Human Activities: A Survey , 2008, IEEE Transactions on Circuits and Systems for Video Technology.

[8]  Adrian Hilton,et al.  A survey of advances in vision-based human motion capture and analysis , 2006, Comput. Vis. Image Underst..

[9]  Svetha Venkatesh,et al.  Hierarchical monitoring of people's behaviors in complex environments using multiple cameras , 2002, Object recognition supported by user interaction for service robots.

[10]  Silvio Savarese,et al.  What are they doing? : Collective activity classification using spatio-temporal relationship among people , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[11]  Christopher G. Harris,et al.  A Combined Corner and Edge Detector , 1988, Alvey Vision Conference.

[12]  Radha Poovendran,et al.  Group Event Detection With a Varying Number of Group Members for Video Surveillance , 2010, IEEE Transactions on Circuits and Systems for Video Technology.

[13]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[14]  Biing-Hwang Juang,et al.  Maximum likelihood estimation for multivariate mixture observations of markov chains , 1986, IEEE Trans. Inf. Theory.

[15]  Robert T. Collins,et al.  Shape constrained figure-ground segmentation and tracking , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Ronald Poppe,et al.  A survey on vision-based human action recognition , 2010, Image Vis. Comput..

[17]  Nicolas Thome,et al.  A Real-Time, Multiview Fall Detection System: A LHMM-Based Approach , 2008, IEEE Transactions on Circuits and Systems for Video Technology.

[18]  Svetha Venkatesh,et al.  Learning and detecting activities from movement trajectories using the hierarchical hidden Markov model , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[19]  Kathryn A. Dowsland,et al.  Simulated Annealing , 1989, Encyclopedia of GIS.

[20]  Xiaohui Liu,et al.  Multi-agent activity recognition using observation decomposedhidden Markov models , 2006, Image Vis. Comput..

[21]  Matthew Brand,et al.  Coupled hidden Markov models for modeling interacting processes , 1997 .

[22]  Peter Rossmanith,et al.  Simulated Annealing , 2008, Taschenbuch der Algorithmen.

[23]  Yang Wang,et al.  Human Action Recognition by Semilatent Topic Models , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Jake K. Aggarwal,et al.  Stochastic Representation and Recognition of High-Level Group Activities , 2011, International Journal of Computer Vision.

[25]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[26]  Goldberg,et al.  Genetic algorithms , 1993, Robust Control Systems with Genetic Algorithms.

[27]  Cristian Sminchisescu,et al.  Conditional models for contextual human motion recognition , 2006, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[28]  Jeff A. Bilmes,et al.  A gentle tutorial of the em algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models , 1998 .

[29]  Greg Mori,et al.  Action recognition by learning mid-level motion features , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Youtian Du,et al.  Activity recognition through multi-scale motion detail analysis , 2008, Neurocomputing.

[31]  Guangyou Xu,et al.  Human action recognition in smart classroom , 2002, Proceedings of Fifth IEEE International Conference on Automatic Face Gesture Recognition.

[32]  Haihong Hu,et al.  Factorial HMM and Parallel HMM for Gait Recognition , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[33]  Alex Pentland,et al.  Coupled hidden Markov models for complex action recognition , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[34]  Li Wang,et al.  Human Action Recognition and Localization in Video Using Structured Learning of Local Space-Time Features , 2010, 2010 7th IEEE International Conference on Advanced Video and Signal Based Surveillance.

[35]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[36]  Pau-Choo Chung,et al.  An Interaction-Embedded HMM Framework for Human Behavior Understanding: With Nursing Environments as Examples , 2010, IEEE Transactions on Information Technology in Biomedicine.

[37]  Mohan M. Trivedi,et al.  Trajectory Learning for Activity Understanding: Unsupervised, Multilevel, and Long-Term Adaptive Approach , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Michael Isard,et al.  CONDENSATION—Conditional Density Propagation for Visual Tracking , 1998, International Journal of Computer Vision.

[39]  Anil K. Jain,et al.  A Network of Dynamic Probabilistic Models for Human Interaction Analysis , 2011, IEEE Transactions on Circuits and Systems for Video Technology.

[40]  Yang Wang,et al.  Beyond Actions: Discriminative Models for Contextual Group Activities , 2010, NIPS.