Local Log-Euclidean Multivariate Gaussian Descriptor and Its Application to Image Classification

This paper presents a novel image descriptor to effectively characterize the local, high-order image statistics. Our work is inspired by the Diffusion Tensor Imaging and the structure tensor method (or covariance descriptor), and motivated by popular distribution-based descriptors such as SIFT and HoG. Our idea is to associate one pixel with a multivariate Gaussian distribution estimated in the neighborhood. The challenge lies in that the space of Gaussians is not a linear space but a Riemannian manifold. We show, for the first time to our knowledge, that the space of Gaussians can be equipped with a Lie group structure by defining a multiplication operation on this manifold, and that it is isomorphic to a subgroup of the upper triangular matrix group. Furthermore, we propose methods to embed this matrix group in the linear space, which enables us to handle Gaussians with Euclidean operations rather than complicated Riemannian operations. The resulting descriptor, called Local Log-Euclidean Multivariate Gaussian (L <inline-formula><tex-math notation="LaTeX">$^2$</tex-math><alternatives> <inline-graphic xlink:href="li-ieq1-2560816.gif"/></alternatives></inline-formula>EMG) descriptor, works well with low-dimensional and high-dimensional raw features. Moreover, our descriptor is a continuous function of features without quantization, which can model the first- and second-order statistics. Extensive experiments were conducted to evaluate thoroughly L<inline-formula><tex-math notation="LaTeX">$^2$</tex-math><alternatives> <inline-graphic xlink:href="li-ieq2-2560816.gif"/></alternatives></inline-formula>EMG, and the results showed that L <inline-formula><tex-math notation="LaTeX">$^2$</tex-math><alternatives> <inline-graphic xlink:href="li-ieq3-2560816.gif"/></alternatives></inline-formula>EMG is very competitive with state-of-the-art descriptors in image classification.

[1]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[2]  Qilong Wang,et al.  Local Log-Euclidean Covariance Matrix (L2ECM) for Image Representation and Its Applications , 2012, ECCV.

[3]  Edward H. Adelson,et al.  Recognizing Materials Using Perceptually Inspired Features , 2013, International Journal of Computer Vision.

[4]  Dieter Fox,et al.  Multipath Sparse Coding Using Hierarchical Matching Pursuit , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Carl-Fredrik Westin,et al.  Representing Local Structure Using Tensors II , 2011, SCIA.

[6]  Rita Cucchiara,et al.  Modeling local descriptors with multivariate gaussians for object and scene recognition , 2013, ACM Multimedia.

[7]  H. Knutsson Representing Local Structure Using Tensors , 1989 .

[8]  Cordelia Schmid,et al.  A performance evaluation of local descriptors , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[10]  Nicholas Ayache,et al.  Geometric Means in a Novel Vector Space Structure on Symmetric Positive-Definite Matrices , 2007, SIAM J. Matrix Anal. Appl..

[11]  Maximilian Bayer,et al.  Numerical Analysis Mathematics Of Scientific Computing , 2016 .

[12]  Bingpeng Ma,et al.  Gaussian Descriptor Based on Local Features for Person Re-identification , 2014, ACCV Workshops.

[13]  Raj Gupta,et al.  SMD: A Locally Stable Monotonic Change Invariant Feature Descriptor , 2008, ECCV.

[14]  Soura Dasgupta,et al.  Some properties of the matrix exponential , 2001 .

[15]  Vincent Lepetit,et al.  DAISY: An Efficient Dense Descriptor Applied to Wide-Baseline Stereo , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Koen E. A. van de Sande,et al.  Evaluating Color Descriptors for Object and Scene Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[18]  Shun-ichi Amari,et al.  Methods of information geometry , 2000 .

[19]  David H. Laidlaw,et al.  Visualization and image processing of tensor fields , 2006 .

[20]  Andrew Zisserman,et al.  The devil is in the details: an evaluation of recent feature encoding methods , 2011, BMVC.

[21]  Thomas Mensink,et al.  Image Classification with the Fisher Vector: Theory and Practice , 2013, International Journal of Computer Vision.

[22]  Andrea Vedaldi,et al.  Vlfeat: an open and portable library of computer vision algorithms , 2010, ACM Multimedia.

[23]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[24]  B. Hall Lie Groups, Lie Algebras, and Representations: An Elementary Introduction , 2004 .

[25]  William T. Freeman,et al.  Orientation Histograms for Hand Gesture Recognition , 1995 .

[26]  James M. Rehg,et al.  CENTRIST: A Visual Descriptor for Scene Categorization , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Cordelia Schmid,et al.  Aggregating Local Image Descriptors into Compact Codes , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Cor J. Veenman,et al.  Visual Word Ambiguity , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Yihong Gong,et al.  Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[30]  N. Higham Computing the polar decomposition with applications , 1986 .

[31]  Nicholas J. Higham,et al.  A Schur-Parlett Algorithm for Computing Matrix Functions , 2003, SIAM J. Matrix Anal. Appl..

[32]  Trevor Darrell,et al.  DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[33]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[34]  A. Baker Matrix Groups: An Introduction to Lie Group Theory , 2003 .

[35]  Liyu Gong,et al.  Shape of Gaussians as feature descriptors , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[36]  William K. Pratt,et al.  Digital Image Processing, 4th Edition , 2007, J. Electronic Imaging.

[37]  Andrew Zisserman,et al.  Three things everyone should know to improve object retrieval , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[38]  Edward H. Adelson,et al.  Material perception: What can you see in a brief glance? , 2010 .

[39]  J. Gallier Logarithms and Square Roots of Real Matrices , 2008, 0805.0245.

[40]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[41]  Ming Yang,et al.  Mining discriminative co-occurrence patterns for visual recognition , 2011, CVPR 2011.

[42]  Luc Van Gool,et al.  Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..

[43]  Zhuowen Tu,et al.  Harvesting Mid-level Visual Concepts from Large-Scale Internet Images , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[44]  C. R. Rao,et al.  Information and the Accuracy Attainable in the Estimation of Statistical Parameters , 1992 .

[45]  Xavier Pennec,et al.  A Riemannian Framework for Tensor Computing , 2005, International Journal of Computer Vision.

[46]  D. Le Bihan,et al.  Diffusion tensor imaging: Concepts and applications , 2001, Journal of magnetic resonance imaging : JMRI.

[47]  Tieniu Tan,et al.  Feature Coding in Image Classification: A Comprehensive Study , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[48]  Miroslav Lovric,et al.  Multivariate Normal Distributions Parametrized as a Riemannian Symmetric Space , 2000 .

[49]  Matti Pietikäinen,et al.  Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[50]  Gaurav Sharma,et al.  Local Higher-Order Statistics (LHS) for Texture Categorization and Facial Analysis , 2012, ECCV.

[51]  Yan Ke,et al.  PCA-SIFT: a more distinctive representation for local image descriptors , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[52]  Krista A. Ehinger,et al.  SUN database: Large-scale scene recognition from abbey to zoo , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[53]  Lei Zhang,et al.  A Novel Earth Mover's Distance Methodology for Image Matching with Gaussian Mixture Models , 2013, 2013 IEEE International Conference on Computer Vision.

[54]  Limin Wang,et al.  Boosting VLAD with Supervised Dictionary Learning and High-Order Statistics , 2014, ECCV.

[55]  E. Jaynes Information Theory and Statistical Mechanics , 1957 .

[56]  Andrew Zisserman,et al.  Efficient additive kernels via explicit feature maps , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[57]  Josep M. Oller,et al.  A distance between multivariate normal distributions based in an embedding into the Siegel group , 1990 .

[58]  Iasonas Kokkinos,et al.  Describing Textures in the Wild , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[59]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[60]  Wilhelm Burger,et al.  Digital Image Processing - An Algorithmic Introduction using Java , 2008, Texts in Computer Science.

[61]  Takumi Kobayashi,et al.  Dirichlet-Based Histogram Feature Transform for Image Classification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[62]  Xiangyang Xue,et al.  Learning Hybrid Part Filters for Scene Recognition , 2012, ECCV.

[63]  Yasuo Kuniyoshi,et al.  Global Gaussian approach for scene categorization using information geometry , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[64]  Fatih Murat Porikli,et al.  Region Covariance: A Fast Descriptor for Detection and Classification , 2006, ECCV.

[65]  L. Skovgaard A Riemannian geometry of the multivariate normal model , 1984 .

[66]  Cordelia Schmid,et al.  A Comparison of Affine Region Detectors , 2005, International Journal of Computer Vision.

[67]  Matti Pietikäinen,et al.  IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2009, TPAMI-2008-09-0620 1 WLD: A Robust Local Image Descriptor , 2022 .

[68]  Hai Tao,et al.  A novel feature descriptor invariant to complex brightness changes , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.