Growing a bag of systems tree for fast and accurate classification

The bag-of-systems (BoS) representation is a descriptor of motion in a video, where dynamic texture (DT) codewords represent the typical motion patterns in spatio-temporal patches extracted from the video. The efficacy of the BoS descriptor depends on the richness of the codebook, which directly depends on the number of codewords in the codebook. However, for even modest sized codebooks, mapping videos onto the codebook results in a heavy computational load. In this paper we propose the BoS Tree, which constructs a bottom-up hierarchy of codewords that enables efficient mapping of videos to the BoS codebook. By leveraging the tree structure to efficiently index the codewords, the BoS Tree allows for fast look-ups in the codebook and enables the practical use of larger, richer codebooks. We demonstrate the effectiveness of BoS Trees on classification of three video datasets, as well as on annotation of a music dataset.

[1]  Jon Louis Bentley,et al.  Multidimensional binary search trees used for associative searching , 1975, CACM.

[2]  Nuno Vasconcelos,et al.  Modeling, Clustering, and Segmenting Video with Mixtures of Dynamic Textures , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Richard P. Wildes,et al.  Dynamic texture recognition based on distributions of spacetime oriented structure , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[4]  Andrew Gilbert,et al.  Action Recognition Using Mined Hierarchical Compound Features , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Gert R. G. Lanckriet,et al.  Semantic Annotation and Retrieval of Music using a Bag of Systems Representation , 2011, ISMIR.

[6]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Yihong Gong,et al.  Human action detection by boosting efficient motion features , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[8]  Lawrence Cayton,et al.  Fast nearest neighbor retrieval for bregman divergences , 2008, ICML '08.

[9]  Stefano Soatto,et al.  Dynamic Textures , 2003, International Journal of Computer Vision.

[10]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[11]  S. Gong,et al.  Recognising action as clouds of space-time interest points , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Payam Saisan,et al.  Dynamic texture recognition , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[13]  René Vidal,et al.  View-invariant dynamic texture recognition using a bag of dynamical systems , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Matti Pietikäinen,et al.  Dynamic Texture Recognition Using Local Binary Patterns with an Application to Facial Expressions , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  R. Shumway,et al.  AN APPROACH TO TIME SERIES SMOOTHING AND FORECASTING USING THE EM ALGORITHM , 1982 .

[16]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[17]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[18]  Antoni B. Chan,et al.  Time Series Models for Semantic Music Annotation , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[19]  Nuno Vasconcelos Image indexing with mixture hierarchies , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[20]  Andrew W. Fitzgibbon,et al.  Shift-Invariant Dynamic Texture Recognition , 2006, ECCV.

[21]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[22]  Jeffrey K. Uhlmann,et al.  Satisfying General Proximity/Similarity Queries with Metric Trees , 1991, Inf. Process. Lett..

[23]  Trevor Darrell,et al.  Approximate Correspondences in High Dimensions , 2006, NIPS.

[24]  Allen Gersho,et al.  Vector quantization and signal compression , 1991, The Kluwer international series in engineering and computer science.

[25]  Antoni B. Chan,et al.  Clustering dynamic textures with the hierarchical EM algorithm , 2013, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[26]  Nuno Vasconcelos,et al.  Probabilistic kernels for the classification of auto-regressive visual processes , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).