Moving vistas: Exploiting motion for describing scenes

Scene recognition in an unconstrained setting is an open and challenging problem with wide applications. In this paper, we study the role of scene dynamics for improved representation of scenes. We subsequently propose dynamic attributes which can be augmented with spatial attributes of a scene for semantically meaningful categorization of dynamic scenes. We further explore accurate and generalizable computational models for characterizing the dynamics of unconstrained scenes. The large intra-class variation due to unconstrained settings and the complex underlying physics present challenging problems in modeling scene dynamics. Motivated by these factors, we propose using the theory of chaotic systems to capture dynamics. Due to the lack of a suitable dataset, we compiled a dataset of ‘in-the-wild’ dynamic scenes. Experimental results show that the proposed framework leads to the best classification rate among other well-known dynamic modeling techniques. We also show how these dynamic features provide a means to describe dynamic scenes with motion-attributes, which then leads to meaningful organization of the video data.

[1]  Stefano Soatto,et al.  Dynamic Textures , 2003, International Journal of Computer Vision.

[2]  Antonio Torralba,et al.  Recognizing indoor scenes , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  G. Williams Chaos theory tamed , 1997 .

[4]  Cordelia Schmid,et al.  Actions in context , 2009, CVPR.

[5]  Nava Rubin,et al.  Real-world scenes can be easily recognized from their edge-detected renditions: Just add motion! , 2010 .

[6]  Rama Chellappa,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 Matching Shape Sequences in Video with Applications in Human Movement Analysis. Ieee Transactions on Pattern Analysis and Machine Intelligence 2 , 2022 .

[7]  M. Perc The dynamics of human gait , 2005 .

[8]  M. Rosenstein,et al.  A practical method for calculating largest Lyapunov exponents from small data sets , 1993 .

[9]  Juan Carlos Niebles,et al.  Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words , 2006, BMVC.

[10]  Rosalind W. Picard,et al.  Texture orientation for sorting photos "at a glance" , 1994, Proceedings of 12th International Conference on Pattern Recognition.

[11]  Li Fei-Fei,et al.  Towards total scene understanding: Classification, annotation and segmentation in an automatic framework , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Nuno Vasconcelos,et al.  Layered Dynamic Textures , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Alexei A. Efros,et al.  Discovering object categories in image collections , 2005 .

[14]  S. Ullman The Interpretation of Visual Motion , 1979 .

[15]  Serge J. Belongie,et al.  Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[16]  Shree K. Nayar,et al.  Attribute and simile classifiers for face verification , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[17]  Mubarak Shah,et al.  Chaotic Invariants for Human Action Recognition , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[18]  Fraser,et al.  Independent coordinates for strange attractors from mutual information. , 1986, Physical review. A, General physics.

[19]  Pietro Perona,et al.  Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[20]  G. Pike,et al.  Recognizing moving faces: The relative contribution of motion and perspective view information. , 1997 .

[21]  Anil K. Jain,et al.  Image classification for content-based indexing , 2001, IEEE Trans. Image Process..

[22]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[23]  Fatih Murat Porikli,et al.  Region Covariance: A Fast Descriptor for Detection and Classification , 2006, ECCV.

[24]  B. Moor,et al.  Subspace angles and distances between ARMA models , 2000 .

[25]  René Vidal,et al.  Optical flow estimation & segmentation of multiple moving dynamic textures , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[26]  Christoph H. Lampert,et al.  Learning to detect unseen object classes by between-class attribute transfer , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Antonio Torralba,et al.  Learning hierarchical models of scenes, objects, and parts , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[28]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[29]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[30]  Cordelia Schmid,et al.  Actions in context , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Ali Farhadi,et al.  Describing objects by their attributes , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  Larry S. Davis,et al.  Beyond Nouns: Exploiting Prepositions and Comparative Adjectives for Learning Visual Classifiers , 2008, ECCV.

[33]  Zenon W. Pylyshyn,et al.  Computational processes in human vision : an interdisciplinary perspective , 1988 .

[34]  Martin Szummer,et al.  Indoor-outdoor image classification , 1998, Proceedings 1998 IEEE International Workshop on Content-Based Access of Image and Video Database.