Scene Modelling and Classification Using Learned Spatial Relations

This paper describes a method for building visual scene models from video data using quantized descriptions of motion. This method enables us to make meaningful statements about video scenes as a whole (such as "this video is like that video") and about regions within these scenes (such as "this part of this scene is similar to this part of that scene"). We do this through unsupervised clustering of simple yet novel motion descriptors, which provide a quantized representation of gross motion within scene regions. Using these we can characterise the dominant patterns of motion, and then group spatial regions based upon both proximity and local motion similarity to define areas or regions with particular motion characteristics. We are able to process scenes in which objects are difficult to detect and track due to variable frame-rate, video quality or occlusion, and we are able to identify regions which differ by usage but which do not differ by appearance (such as frequently used paths across open space). We demonstrate our method on 50 videos making up very different scene types: indoor scenarios with unpredictable unconstrained motion, junction scenes, road and path scenes, and open squares or plazas. We show that these scenes can be clustered using our representation, and that the incorporation of learned spatial relations into the representation enables us to cluster more effectively.

[1]  Richard Bowden,et al.  Probabilistic learning of salient patterns across spatially separated, uncalibrated views , 2004 .

[2]  Joachim M. Buhmann,et al.  Object Categorization by Compositional Graphical Models , 2005, EMMCVPR.

[3]  Anthony Hoogs,et al.  Recognition and Segmentation of Scene Content using Region-Based Classification , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[4]  Dima Damen,et al.  Recognizing linked events: Searching the space of feasible explanations , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  W. Eric L. Grimson,et al.  Learning Patterns of Activity Using Real-Time Tracking , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Jitendra Malik,et al.  Recognizing action at a distance , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[7]  Ivan Laptev,et al.  On Space-Time Interest Points , 2005, International Journal of Computer Vision.

[8]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Alexei A. Efros,et al.  Closing the loop in scene interpretation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  D. Cook,et al.  A Basic Course in Statistics , 1979 .

[11]  Wilfried Brauer,et al.  Spatial Cognition III , 2003, Lecture Notes in Computer Science.

[12]  Luc Van Gool,et al.  Probabilistic Parameter Selection for Learning Scene Structure from Video , 2008, BMVC.

[13]  Stephen J. McKenna,et al.  Summarising contextual activity and detecting unusual inactivity in a supportive home environment , 2004, Pattern Analysis and Applications.

[14]  C Tomasi,et al.  Shape and motion from image streams: a factorization method. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[15]  Carlo Tomasi,et al.  Good features to track , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Guillermo Sapiro,et al.  What Can Casual Walkers Tell Us About A 3D Scene? , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[17]  Tim J. Ellis,et al.  Learning semantic scene models from observing activity in visual surveillance , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[18]  Dima Damen,et al.  British Machine Vision Conference (BMVC) , 2007 .

[19]  Olga Veksler,et al.  Fast Approximate Energy Minimization via Graph Cuts , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[20]  Takeo Kanade,et al.  An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[21]  Anthony G. Cohn,et al.  Modelling Scenes Using the Activity within Them , 2008, Spatial Cognition.

[22]  W. Eric L. Grimson,et al.  Unsupervised Activity Perception in Crowded and Complicated Scenes Using Hierarchical Bayesian Models , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  David C. Hogg,et al.  Learning the distribution of object trajectories for event recognition , 1996, Image Vis. Comput..

[24]  Vladimir Kolmogorov,et al.  An Experimental Comparison of Min-Cut/Max-Flow Algorithms for Energy Minimization in Vision , 2004, IEEE Trans. Pattern Anal. Mach. Intell..

[25]  Antti Oulasvirta,et al.  Computer Vision – ECCV 2006 , 2006, Lecture Notes in Computer Science.

[26]  Patrick Pérez,et al.  Retrieving actions in movies , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[27]  Cordelia Schmid,et al.  Human Detection Using Oriented Histograms of Flow and Appearance , 2006, ECCV.

[28]  Alexei A. Efros,et al.  Putting Objects in Perspective , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).