Unsupervised scene analysis: a hidden Markov model approach

This paper presents a new approach to scene analysis, which aims at extracting structured information from a video sequence using directly low-level data. The method models the sequence using a forest of Hidden Markov models (HMMs), which are able to extract two kinds of data, namely, static and dynamic information. The static information results in a segmentation that explains how the chromatic aspect of the static part of the scene evolves. The dynamic information results in the detection of the areas which are more affected by foreground activity. The former is obtained by a spatial clustering of HMMs, resulting in a spatio-temporal segmentation of the video sequence, which is robust to noise and clutter and does not consider the possible moving objects in the scene. The latter is estimated using an entropy-like measure defined on the stationary probability of the Markov chain associated to the HMMs, producing a partition of the scene in activity zones in a consistent and continuous way. The proposed approach constitutes a principled unified probabilistic framework for low level scene analysis and understanding, showing several key features with respect to the state of the art methods, as it extracts information at the lowest possible level (using only pixel gray-level temporal behavior), and is unsupervised in nature. The obtained results on real sequences, both indoor and outdoor, show the efficacy of the proposed approach.

[1]  Alex Pentland,et al.  Coupled hidden Markov models for complex action recognition , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[2]  Shaogang Gong,et al.  On the semantics of visual behaviour, structured events and trajectories of human action , 2002, Image Vis. Comput..

[3]  Alex Pentland,et al.  Action Reaction Learning: Automatic Visual Analysis and Synthesis of Interactive Behaviour , 1999, ICVS.

[4]  Russell C. Hardie,et al.  Joint MAP registration and high-resolution image estimation using a sequence of undersampled images , 1997, IEEE Trans. Image Process..

[5]  Matthew Brand,et al.  An Entropic Estimator for Structure Discovery , 1998, NIPS.

[6]  Brendan J. Frey,et al.  Transformed hidden Markov models: estimating mixture models of images and inferring spatial transformations in video sequences , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[7]  Konrad Tollmar,et al.  Activity maps for location-aware computing , 2002, Sixth IEEE Workshop on Applications of Computer Vision, 2002. (WACV 2002). Proceedings..

[8]  Shaogang Gong,et al.  Recognition of group activities using dynamic probabilistic networks , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[9]  Joachim M. Buhmann,et al.  Topology free hidden Markov models: application to background modeling , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[10]  Mário A. T. Figueiredo,et al.  Similarity-Based Clustering of Sequences Using Hidden Markov Models , 2003, MLDM.

[11]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.

[12]  Agostino Dovier,et al.  Designing the Minimal Structure of Hidden Markov Model by Bisimulation , 2001, EMMCVPR.

[13]  Vladimir Pavlovic,et al.  Discovering clusters in motion time-series data , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[14]  Manuele Bicego,et al.  Integrated region- and pixel-based approach to background modelling , 2002, Workshop on Motion and Video Computing, 2002. Proceedings..

[15]  L. R. Rabiner,et al.  A probabilistic distance measure for hidden Markov models , 1985, AT&T Technical Journal.

[16]  James W. Davis,et al.  An appearance-based representation of action , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[17]  David C. Hogg,et al.  Learning Variable-Length Markov Models of Behavior , 2001, Comput. Vis. Image Underst..

[18]  Shaogang Gong,et al.  Continuous global evidence-based Bayesian modality fusion for simultaneous tracking of multiple objects , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[19]  Michael Elad,et al.  Superresolution restoration of an image sequence: adaptive filtering approach , 1999, IEEE Trans. Image Process..

[20]  Takeo Kanade,et al.  A System for Video Surveillance and Monitoring , 2000 .

[21]  Michael Elad,et al.  Super-Resolution Reconstruction of Image Sequences , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[22]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[23]  James W. Davis,et al.  The representation and recognition of human movement using temporal templates , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[24]  Dragutin Petkovic,et al.  Query by Image and Video Content: The QBIC System , 1995, Computer.

[25]  Manuele Bicego,et al.  A Hidden Markov Model-Based Approach to Sequential Data Clustering , 2002, SSPR/SPR.

[26]  Larry S. Davis,et al.  W4: Real-Time Surveillance of People and Their Activities , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[27]  Takeo Kanade,et al.  Limits on Super-Resolution and How to Break Them , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[28]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[29]  David C. Hogg,et al.  Statistical Models of Object Interaction , 2004, International Journal of Computer Vision.

[30]  Mário A. T. Figueiredo,et al.  A sequential pruning strategy for the selection of the number of states in hidden Markov models , 2003, Pattern Recognit. Lett..

[31]  Takashi Matsuyama,et al.  Multiobject Behavior Recognition by Event Driven Selective Attention Method , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[32]  Cen Li,et al.  Applying the Hidden Markov Model Methodology for Unsupervised Learning of Temporal Data , 2002 .

[33]  Joydeep Ghosh,et al.  HMMs and Coupled HMMs for multi-channel EEG classification , 2002, Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN'02 (Cat. No.02CH37290).

[34]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[35]  Michal Irani,et al.  Improving resolution by image registration , 1991, CVGIP Graph. Model. Image Process..

[36]  Max A. Viergever,et al.  Efficient and reliable schemes for nonlinear diffusion filtering , 1998, IEEE Trans. Image Process..

[37]  Stéphane Marchand-Maillet,et al.  Content-Based Video Retrieval: an Overview , 2000 .

[38]  Andreas Stolcke,et al.  Hidden Markov Model} Induction by Bayesian Model Merging , 1992, NIPS.

[39]  David C. Hogg,et al.  Learning the distribution of object trajectories for event recognition , 1996, Image Vis. Comput..

[40]  Donald B. Rubin,et al.  Max-imum Likelihood from Incomplete Data , 1972 .

[41]  Milind R. Naphade,et al.  A probabilistic framework for semantic indexing and retrieval in video , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[42]  Andrew Zisserman,et al.  Computer vision applied to super resolution , 2003, IEEE Signal Process. Mag..

[43]  New York Dover,et al.  ON THE CONVERGENCE PROPERTIES OF THE EM ALGORITHM , 1983 .

[44]  Dariu Gavrila,et al.  The Visual Analysis of Human Movement: A Survey , 1999, Comput. Vis. Image Underst..

[45]  Robert L. Stevenson,et al.  Extraction of high-resolution frames from video sequences , 1996, IEEE Trans. Image Process..

[46]  Olivier Cappé,et al.  Ten years of HMMs , 2001 .

[47]  Gautam Biswas,et al.  A Bayesian Approach to Temporal Data Clustering using Hidden Markov Models , 2000, ICML.

[48]  Christopher M. Bishop,et al.  Bayesian Image Super-Resolution , 2002, NIPS.

[49]  Shaogang Gong,et al.  Autonomous Visual Events Detection and Classification without Explicit Object-Centred Segmentation and Tracking , 2002, BMVC.

[50]  Peter Cheeseman,et al.  Super-Resolved Surface Reconstruction from Multiple Images , 1996 .

[51]  Claus Bahlmann,et al.  Measuring HMM similarity with the Bayes probability of error and its application to online handwriting recognition , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[52]  Chris Stauffer,et al.  Estimating Tracking Sources and Sinks , 2003, 2003 Conference on Computer Vision and Pattern Recognition Workshop.

[53]  Takeo Kanade,et al.  Hallucinating faces , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[54]  B. Frey,et al.  Transformation-Invariant Clustering Using the EM Algorithm , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[55]  Shaogang Gong,et al.  Learning pixel-wise signal energy for understanding semantics , 2003, Image Vis. Comput..

[56]  Chng Eng Siong,et al.  Foreground motion detection by difference-based spatial temporal entropy image , 2004, 2004 IEEE Region 10 Conference TENCON 2004..

[57]  K. Ramchandran,et al.  A factor graph framework for semantic indexing and retrieval in video , 2000, 2000 Proceedings Workshop on Content-based Access of Image and Video Libraries.

[58]  Hilary Buxton,et al.  Learning and understanding dynamic scene activity: a review , 2003, Image Vis. Comput..

[59]  Padhraic Smyth,et al.  Clustering Sequences with Hidden Markov Models , 1996, NIPS.

[60]  W. Eric L. Grimson,et al.  Adaptive background mixture models for real-time tracking , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[61]  Takeo Kanade,et al.  Introduction to the Special Section on Video Surveillance , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[62]  M. Cristani,et al.  Multi-level background initialization using Hidden Markov Models , 2003, IWVS '03.

[63]  Andrew Blake,et al.  A Probabilistic Background Model for Tracking , 2000, ECCV.

[64]  Alex Pentland,et al.  Graphical Models for Recognizing Human Interactions , 1998, NIPS.

[65]  Luc Van Gool,et al.  A Probabilistic Approach to Large Displacement Optical Flow and Occlusion Detection , 2004, ECCV Workshop SMVP.