More about VLAD: A leap from Euclidean to Riemannian manifolds

This paper takes a step forward in image and video coding by extending the well-known Vector of Locally Aggregated Descriptors (VLAD) onto an extensive space of curved Riemannian manifolds. We provide a comprehensive mathematical framework that formulates the aggregation problem of such manifold data into an elegant solution. In particular, we consider structured descriptors from visual data, namely Region Covariance Descriptors and linear subspaces that reside on the manifold of Symmetric Positive Definite matrices and the Grassmannian manifolds, respectively. Through rigorous experimental validation, we demonstrate the superior performance of this novel Riemannian VLAD descriptor on several visual classification tasks including video-based face recognition, dynamic scene recognition, and head pose classification.

[1]  Michael Werman,et al.  Affine Invariance Revisited , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[2]  Matti Pietikäinen,et al.  Dynamic Texture Recognition Using Local Binary Patterns with an Application to Facial Expressions , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Andrew Zisserman,et al.  Deep Fisher Networks for Large-Scale Image Classification , 2013, NIPS.

[4]  Suvrit Sra,et al.  A new metric on the manifold of kernel matrices with application to matrix geometric means , 2012, NIPS.

[5]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[6]  Rama Chellappa,et al.  Kernel Learning for Extrinsic Classification of Manifold Features , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Brian C. Lovell,et al.  Dictionary Learning and Sparse Coding on Grassmann Manifolds: An Extrinsic Solution , 2013, 2013 IEEE International Conference on Computer Vision.

[8]  Svetlana Lazebnik,et al.  Multi-scale Orderless Pooling of Deep Convolutional Activation Features , 2014, ECCV.

[9]  Peter Meer,et al.  Nonlinear Mean Shift over Riemannian Manifolds , 2009, International Journal of Computer Vision.

[10]  Ming-Hsuan Yang,et al.  Incremental Learning for Robust Visual Tracking , 2008, International Journal of Computer Vision.

[11]  Hervé Jégou,et al.  Negative Evidences and Co-occurences in Image Retrieval: The Benefit of PCA and Whitening , 2012, ECCV.

[12]  Richard P. Wildes,et al.  Bags of Spacetime Energies for Dynamic Scene Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Brian C. Lovell,et al.  Fisher tensors for classifying human epithelial cells , 2014, Pattern Recognit..

[14]  Larry S. Davis,et al.  Covariance discriminative learning: A natural and efficient approach to image set classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Mohammed Bennamoun,et al.  Learning Non-linear Reconstruction Models for Image Set Classification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Trevor Darrell,et al.  The pyramid match kernel: discriminative classification with sets of image features , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[17]  Cordelia Schmid,et al.  Good Practice in Large-Scale Learning for Image Classification , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Iasonas Kokkinos,et al.  Describing Textures in the Wild , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[20]  Andrew Zisserman,et al.  All About VLAD , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Jitendra Malik,et al.  Representing and Recognizing the Visual Appearance of Materials using Three-dimensional Textons , 2001, International Journal of Computer Vision.

[22]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[23]  Mehrtash Tafazzoli Harandi,et al.  Bregman Divergences for Infinite Dimensional Covariance Matrices , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Vittorio Murino,et al.  Characterizing Humans on Riemannian Manifolds , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Huizhong Chen,et al.  Residual Enhanced Visual Vectors for on-device image matching , 2011, 2011 Conference Record of the Forty Fifth Asilomar Conference on Signals, Systems and Computers (ASILOMAR).

[26]  Patrick Bouthemy,et al.  Better Exploiting Motion for Better Action Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Matti Pietikäinen,et al.  Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[28]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[29]  Narendra Ahuja,et al.  Maximum Margin Distance Learning for Dynamic Texture Recognition , 2010, ECCV.

[30]  René Vidal,et al.  Clustering and dimensionality reduction on Riemannian manifolds , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  SaltonGerard,et al.  Term-weighting approaches in automatic text retrieval , 1988 .

[32]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[33]  Brian C. Lovell,et al.  Non-Linear Stationary Subspace Analysis with Application to Video Classification , 2013, ICML.

[34]  Cordelia Schmid,et al.  Aggregating Local Image Descriptors into Compact Codes , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Shree K. Nayar,et al.  Multiresolution histograms and their use for recognition , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Nicholas Ayache,et al.  Geometric Means in a Novel Vector Space Structure on Symmetric Positive-Definite Matrices , 2007, SIAM J. Matrix Anal. Appl..

[37]  Gene H. Golub,et al.  Matrix computations , 1983 .

[38]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[39]  Thomas Mensink,et al.  Improving the Fisher Kernel for Large-Scale Image Classification , 2010, ECCV.

[40]  Xavier Pennec,et al.  A Riemannian Framework for Tensor Computing , 2005, International Journal of Computer Vision.

[41]  Vladimir Pavlovic,et al.  Face tracking and recognition with visual constraints in real-world videos , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[42]  Limin Wang,et al.  Boosting VLAD with Supervised Dictionary Learning and High-Order Statistics , 2014, ECCV.

[43]  Rama Chellappa,et al.  Statistical Computations on Grassmann and Stiefel Manifolds for Image and Video-Based Recognition , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44]  Suvrit Sra,et al.  Conic Geometric Optimization on the Manifold of Positive Definite Matrices , 2013, SIAM J. Optim..

[45]  Richard P. Wildes,et al.  Dynamic scene understanding: The role of orientation features in space and time in scene classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[46]  Daniel D. Lee,et al.  Grassmann discriminant analysis: a unifying view on subspace-based learning , 2008, ICML '08.

[47]  Zhizhou Wang,et al.  An affine invariant tensor dissimilarity measure and its applications to tensor-valued image segmentation , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[48]  Gene H. Golub,et al.  Matrix computations (3rd ed.) , 1996 .

[49]  Brian C. Lovell,et al.  Discriminative Non-Linear Stationary Subspace Analysis for Video Classification , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[50]  Mehrtash Tafazzoli Harandi,et al.  Material Classification on Symmetric Positive Definite Manifolds , 2015, 2015 IEEE Winter Conference on Applications of Computer Vision.

[51]  Anoop Cherian,et al.  Jensen-Bregman LogDet Divergence with Application to Efficient Similarity Search for Covariance Matrices , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[52]  Robert E. Mahony,et al.  Optimization Algorithms on Matrix Manifolds , 2007 .

[53]  Cordelia Schmid,et al.  Action Recognition with Improved Trajectories , 2013, 2013 IEEE International Conference on Computer Vision.

[54]  Florent Perronnin,et al.  Fisher Kernels on Visual Vocabularies for Image Categorization , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[55]  Baba C. Vemuri,et al.  On A Nonlinear Generalization of Sparse Coding and Dictionary Learning , 2013, ICML.

[56]  R. Bhatia Positive Definite Matrices , 2007 .

[57]  Tido Röder,et al.  Documentation Mocap Database HDM05 , 2007 .

[58]  Fatih Murat Porikli,et al.  Pedestrian Detection via Classification on Riemannian Manifolds , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.