View-invariant dynamic texture recognition using a bag of dynamical systems

In this paper, we consider the problem of categorizing videos of dynamic textures under varying view-point. We propose to model each video with a collection of linear dynamics systems (LDSs) describing the dynamics of spatiotemporal video patches. This bag of systems (BoS) representation is analogous to the bag of features (BoF) representation, except that we use LDSs as feature descriptors. This poses several technical challenges to the BoF framework. Most notably, LDSs do not live in a Euclidean space, hence novel methods for clustering LDSs and computing codewords of LDSs need to be developed. Our framework makes use of nonlinear dimensionality reduction and clustering techniques combined with the Martin distance for LDSs for tackling these issues. Our experiments show that our BoS approach can be used for recognizing dynamic textures in challenging scenarios, which could not be handled by existing dynamic texture recognition methods.

[1]  Andrew W. Fitzgibbon,et al.  Shift-Invariant Dynamic Texture Recognition , 2006, ECCV.

[2]  Luc Van Gool,et al.  An Efficient Dense and Scale-Invariant Spatio-Temporal Interest Point Detector , 2008, ECCV.

[3]  David G. Stork,et al.  Pattern Classification , 1973 .

[4]  Nuno Vasconcelos,et al.  Probabilistic kernels for the classification of auto-regressive visual processes , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[5]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[6]  S. Nayar,et al.  Recognition of Dynamic Textures using Impulse Responses of State Variables , 2004 .

[7]  Serge J. Belongie,et al.  Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[8]  Ivan Laptev,et al.  On Space-Time Interest Points , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[9]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[10]  Alexander J. Smola,et al.  Binet-Cauchy Kernels on Dynamical Systems and its Application to the Analysis of Dynamic Scenes , 2007, International Journal of Computer Vision.

[11]  Stefano Soatto,et al.  Dynamic Textures , 2003, International Journal of Computer Vision.

[12]  Nuno Vasconcelos,et al.  Classifying Video with Kernel Dynamic Textures , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[14]  René Vidal,et al.  DynamicBoost: Boosting Time Series Generated by Dynamical Systems , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[15]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[16]  Roberto Cipolla,et al.  Extracting Spatiotemporal Interest Points using Global Information , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[17]  Payam Saisan,et al.  Dynamic texture recognition , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.