Modelling Scenes Using the Activity within Them

This paper describes a method for building visual "maps" from video data using quantized descriptions of motion. This enables unsupervised classification of scene regions based upon the motion patterns observed within them. Our aim is to recognise generic places using a qualitative representation of the spatial layout of regions with common motion patterns. Such places are characterised by the distribution of these motion patterns as opposed to static appearance patterns, and could include locations such as train platforms, bus stops, and park benches. Motion descriptions are obtained by tracking image features over a temporal window, and are then subjected to normalisation and thresholding to provide a quantized representation of that feature's gross motion. Input video is quantized spatially into N×Npixel blocks, and a histogram of the frequency of occurrence of each vector is then built for each of these small areas of scene. Within these we can therefore characterise the dominant patterns of motion, and then group our spatial regions based upon both proximity and local motion similarity to define areas or regions with particular motion characteristics. Moving up a level we then consider the relationship between the motion in adjacent spatial areas, and can characterise the dominant patterns of motion expected in a particular part of the scene over time. The current paper differs from previous work which has largely been based on the pathsof moving agents, and therefore restricted to scenes in which such paths are identifiable. We demonstrate our method in three very different scenes: an indoor room scenario with multiple chairs and unpredictable unconstrained motion, an underground station featuring regions where motion is constrained (train tracks) and regions with complicated motion and difficult occlusion relationships (platform), and an outdoor scene with challenging camera motion and partially overlapping video streams.

[1]  C Tomasi,et al.  Shape and motion from image streams: a factorization method. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[2]  W. Eric L. Grimson,et al.  Learning Patterns of Activity Using Real-Time Tracking , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Dima Damen,et al.  Detecting Carried Objects in Short Video Sequences , 2008, ECCV.

[4]  Stephen J. McKenna,et al.  Summarising contextual activity and detecting unusual inactivity in a supportive home environment , 2004, Pattern Analysis and Applications.

[5]  Takeo Kanade,et al.  An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[6]  Olga Veksler,et al.  Fast Approximate Energy Minimization via Graph Cuts , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Jitendra Malik,et al.  Recognizing action at a distance , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[8]  Manuele Bicego,et al.  Unsupervised scene analysis: A hidden Markov model approach , 2006, Comput. Vis. Image Underst..

[9]  Carlo Tomasi,et al.  Good features to track , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Vladimir Kolmogorov,et al.  An experimental comparison of min-cut/max- flow algorithms for energy minimization in vision , 2001, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Ivan Laptev,et al.  On Space-Time Interest Points , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[12]  David C. Hogg,et al.  Learning the distribution of object trajectories for event recognition , 1996, Image Vis. Comput..

[13]  Cordelia Schmid,et al.  Human Detection Using Oriented Histograms of Flow and Appearance , 2006, ECCV.

[14]  Shaogang Gong,et al.  Beyond Tracking: Modelling Activity and Understanding Behaviour , 2006, International Journal of Computer Vision.

[15]  John K. Tsotsos,et al.  Detecting Motion Patterns via Direction Maps with Application to Surveillance , 2005, 2005 Seventh IEEE Workshops on Applications of Computer Vision (WACV/MOTION'05) - Volume 1.

[16]  Sergio A. Velastin,et al.  Markov models of periodically varying backgrounds for change detection , 2007 .

[17]  David C. Hogg,et al.  Learning the Distribution of Object Trajectories for Event Recognition , 1995, BMVC.

[18]  Patrick Pérez,et al.  Retrieving actions in movies , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[19]  Vladimir Kolmogorov,et al.  What energy functions can be minimized via graph cuts? , 2002, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Tim J. Ellis,et al.  Learning semantic scene models from observing activity in visual surveillance , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[21]  Anthony G. Cohn,et al.  Generation of Semantic Regions from Image Sequences , 1996, ECCV.

[22]  Richard Bowden,et al.  Probabilistic learning of salient patterns across spatially separated, uncalibrated views , 2004 .

[23]  Dima Damen,et al.  Recognizing linked events: Searching the space of feasible explanations , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.