Modeling Interaction Using Learnt Qualitative Spatio-Temporal Relations and Variable Length Markov Models

Motivated by applications such as automated visual surveillance and video monitoring and annotation, there has been a lot of interest in constructing cognitive vision systems capable of interpreting the high level semantics of dynamic scenes. In this paper we present a novel approach for automatically inferring models of object interactions that can be used to interpret observed behaviour within a scene. A real-time low-level computer vision system, together with an attentional control mechanism, are used to identify incidents or events that occur in the scene. A data driven approach has been taken in order to automatically infer discrete and abstract representations (symbols) of primitive object interactions; effectively the system learns a set of qualitative spatial relations relevant to the dynamic behaviour of the domain. These symbols then form the alphabet of a VLMM which automatically infers the high level structure of typical interactive behaviour. The learnt behaviour model has generative capabilities and is also capable of recognizing typical or atypical activities within a scene. Experiments have been performed within the traffic monitoring domain; however the proposed method is applicable to the general automatic surveillance task since it does not assume a priori knowledge of a specific domain.

[1]  David C. Hogg,et al.  Learning Behaviour Models of Human Activities , 1999, BMVC.

[2]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[3]  David Zipser,et al.  Feature Discovery by Competive Learning , 1986, Cogn. Sci..

[4]  Robert J. Schalkoff,et al.  Pattern recognition : statistical, structural and neural approaches / Robert J. Schalkoff , 1992 .

[5]  David C. Hogg,et al.  Learning the distribution of object trajectories for event recognition , 1996, Image Vis. Comput..

[6]  Alex Pentland,et al.  A Bayesian Computer Vision System for Modeling Human Interactions , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  W. Eric L. Grimson,et al.  Adaptive background mixture models for real-time tracking , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[8]  Ian H. Witten,et al.  Text Compression , 1990, 125 Problems in Text Algorithms.

[9]  Derek R. Magee,et al.  Tracking multiple vehicles using foreground, background and motion models , 2004, Image Vis. Comput..

[10]  Isabelle Guyon,et al.  Design of a linguistic postprocessor using variable memory length Markov models , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[11]  Anthony G. Cohn,et al.  Constructing qualitative event models automatically from video input , 2000, Image Vis. Comput..

[12]  Jianying Hu,et al.  Language modeling using stochastic automata with variable length contexts , 1997, Comput. Speech Lang..

[13]  R. Nigel Horspool,et al.  Data Compression Using Dynamic Markov Modelling , 1987, Comput. J..

[14]  Slava M. Katz,et al.  Estimation of probabilities from sparse data for the language model component of a speech recognizer , 1987, IEEE Trans. Acoust. Speech Signal Process..

[15]  R. Gray,et al.  Vector quantization , 1984, IEEE ASSP Magazine.

[16]  Dana Ron,et al.  The Power of Amnesia , 1993, NIPS.

[17]  Biing-Hwang Juang,et al.  Hidden Markov Models for Speech Recognition , 1991 .

[18]  David C. Hogg,et al.  Learning Variable-Length Markov Models of Behavior , 2001, Comput. Vis. Image Underst..

[19]  Teuvo Kohonen,et al.  The self-organizing map , 1990 .

[20]  Matthew Brand,et al.  Discovery and Segmentation of Activities in Video , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[21]  Robert J. Schalkoff,et al.  Pattern recognition - statistical, structural and neural approaches , 1991 .

[22]  Frederick Jelinek,et al.  Statistical methods for speech recognition , 1997 .

[23]  Aaron F. Bobick,et al.  Parsing multi-agent interactions , 1998, CVPR 1998.