Learning Log-Determinant Divergences for Positive Definite Matrices

Representations in the form of Symmetric Positive Definite (SPD) matrices have been popularized in a variety of visual learning applications due to their demonstrated ability to capture rich second-order statistics of visual data. There exist several similarity measures for comparing SPD matrices with documented benefits. However, selecting an appropriate measure for a given problem remains a challenge and in most cases, is the result of a trial-and-error process. In this paper, we propose to learn similarity measures in a data-driven manner. To this end, we capitalize on the alpha-beta-log-det divergence, which is a meta-divergence parametrized by scalars alpha and beta, subsuming a wide family of popular information divergences on SPD matrices for distinct and discrete values of these parameters. Our key idea is to cast these parameters in a continuum and learn them from data. We systematically extend this idea to learn vector-valued parameters, thereby increasing the expressiveness of the underlying non-linear measure. We conjoin the divergence learning problem with several standard tasks in machine learning, including supervised discriminative dictionary learning and unsupervised SPD matrix clustering. We present Riemannian descent schemes for optimizing our formulations efficiently and show the usefulness of our method on eight standard computer vision tasks.

[1]  J. Lafferty Additive models, boosting, and inference for generalized divergences , 1999, COLT '99.

[2]  Suvrit Sra,et al.  Geometric Mean Metric Learning , 2016, ICML.

[3]  Mehrtash Tafazzoli Harandi,et al.  Riemannian coding and dictionary learning: Kernels to the rescue , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Anoop Cherian,et al.  Second-order Temporal Pooling for Action Recognition , 2017, International Journal of Computer Vision.

[5]  Shun-ichi Amari,et al.  Methods of information geometry , 2000 .

[6]  Cristian Sminchisescu,et al.  Matrix Backpropagation for Deep Networks with Structured Layers , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[7]  Baba C. Vemuri,et al.  On A Nonlinear Generalization of Sparse Coding and Dictionary Learning , 2013, ICML.

[8]  Bruno Pelletier Kernel density estimation on Riemannian manifolds , 2005 .

[9]  Robert H. Halstead,et al.  Matrix Computations , 2011, Encyclopedia of Parallel Computing.

[10]  Shiguang Shan,et al.  Log-Euclidean Metric Learning on Symmetric Positive Definite Manifold with Application to Image Set Classification , 2015, ICML.

[11]  Inderjit S. Dhillon,et al.  Learning low-rank kernel matrices , 2006, ICML.

[12]  René Vidal,et al.  Clustering and dimensionality reduction on Riemannian manifolds , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Luc Van Gool,et al.  A Riemannian Network for SPD Matrix Learning , 2016, AAAI.

[14]  S. Luo,et al.  Informational distance on quantum-state space , 2004 .

[15]  Cordelia Schmid,et al.  Towards Understanding Action Recognition , 2013, 2013 IEEE International Conference on Computer Vision.

[16]  Charles Blundell,et al.  Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles , 2016, NIPS.

[17]  Vassilios Morellas,et al.  Metric learning for semi-supervised clustering of Region Covariance Descriptors , 2009, 2009 Third ACM/IEEE International Conference on Distributed Smart Cameras (ICDSC).

[18]  Vassilios Morellas,et al.  Evaluation of feature descriptors for cancerous tissue recognition , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[19]  Anoop Cherian,et al.  Riemannian Dictionary Learning and Sparse Coding for Positive Definite Matrices , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[20]  Yang Gao,et al.  Compact Bilinear Pooling , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Sergio Cruces,et al.  Log-Determinant Divergences Revisited: Alpha-Beta and Gamma Log-Det Divergences , 2014, Entropy.

[22]  Duc Fehr Covariance Based Point Cloud Descriptors for Object Detection and Classification , 2013 .

[23]  Raul Kompass,et al.  A Generalized Divergence Measure for Nonnegative Matrix Factorization , 2007, Neural Computation.

[24]  Peter Meer,et al.  Nonlinear Mean Shift over Riemannian Manifolds , 2009, International Journal of Computer Vision.

[25]  Mihoko Minami,et al.  Robust Blind Source Separation by Beta Divergence , 2002, Neural Computation.

[26]  Anoop Cherian,et al.  Jensen-Bregman LogDet Divergence with Application to Efficient Similarity Search for Covariance Matrices , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Sergio Cruces,et al.  Optimization of Alpha-Beta Log-Det Divergences and their Application in the Spatial Filtering of Two Class Motor Imagery Movements , 2017, Entropy.

[28]  R. Bhatia,et al.  On the Bures–Wasserstein distance between positive definite matrices , 2017, Expositiones Mathematicae.

[29]  Vassilios Morellas,et al.  Bayesian Nonparametric Clustering for Positive Definite Matrices , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Xavier Pennec,et al.  A Riemannian Framework for Tensor Computing , 2005, International Journal of Computer Vision.

[31]  Thomas Serre,et al.  HMDB: A large video database for human motion recognition , 2011, 2011 International Conference on Computer Vision.

[32]  Fatih Murat Porikli,et al.  Region Covariance: A Fast Descriptor for Detection and Classification , 2006, ECCV.

[33]  René Vidal,et al.  Kernel sparse subspace clustering , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[34]  Maher Moakher,et al.  Symmetric Positive-Definite Matrices: From Geometry to Applications and Visualization , 2006, Visualization and Processing of Tensor Fields.

[35]  G. Borgefors,et al.  Segmentation of virus particle candidates in transmission electron microscopy images , 2012, Journal of microscopy.

[36]  Andrew Zisserman,et al.  Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.

[37]  Mehrtash Harandi,et al.  Dimensionality Reduction on SPD Manifolds: The Emergence of Geometry-Aware Methods , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Dieter Fox,et al.  A large-scale hierarchical multi-view RGB-D object dataset , 2011, 2011 IEEE International Conference on Robotics and Automation.

[39]  Junbin Gao,et al.  Kernel Sparse Subspace Clustering on Symmetric Positive Definite Manifolds , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Andrzej Cichocki,et al.  Nonnegative Matrix and Tensor Factorization T , 2007 .

[41]  Mehrtash Tafazzoli Harandi,et al.  Bregman Divergences for Infinite Dimensional Covariance Matrices , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[42]  Erkki Oja,et al.  Learning the Information Divergence , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43]  Inderjit S. Dhillon,et al.  Generalized Nonnegative Matrix Approximations with Bregman Divergences , 2005, NIPS.

[44]  H. Minh,et al.  Alpha-Beta Log-Determinant Divergences Between Positive Definite Trace Class Operators , 2019, Information Geometry.

[45]  Vassilios Morellas,et al.  Clustering Positive Definite Matrices by Learning Information Divergences , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[46]  Masashi Sugiyama,et al.  Averaging covariance matrices for EEG signal classification based on the CSP: An empirical study , 2015, 2015 23rd European Signal Processing Conference (EUSIPCO).

[47]  N. Ayache,et al.  Log‐Euclidean metrics for fast and simple calculus on diffusion tensors , 2006, Magnetic resonance in medicine.

[48]  Mathieu Salzmann,et al.  Statistically Motivated Second Order Pooling , 2018, ECCV.

[49]  Mikhail Belkin,et al.  Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering , 2001, NIPS.

[50]  Ali Taylan Cemgil,et al.  Learning mixed divergences in coupled matrix and tensor factorization models , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[51]  Andrzej Cichocki,et al.  Families of Alpha- Beta- and Gamma- Divergences: Flexible and Robust Measures of Similarities , 2010, Entropy.

[52]  Matti Pietikäinen,et al.  A comparative study of texture measures with classification based on featured distributions , 1996, Pattern Recognit..

[53]  A. Bhattacharyya On a measure of divergence between two statistical populations defined by their probability distributions , 1943 .

[54]  Lei Zhang,et al.  Log-Euclidean Kernels for Sparse Representation and Dictionary Learning , 2013, 2013 IEEE International Conference on Computer Vision.

[55]  Levent Tunçel,et al.  Optimization algorithms on matrix manifolds , 2009, Math. Comput..

[56]  Suvrit Sra,et al.  A new metric on the manifold of kernel matrices with application to matrix geometric means , 2012, NIPS.

[57]  Zhizhou Wang,et al.  An affine invariant tensor dissimilarity measure and its applications to tensor-valued image segmentation , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[58]  Larry S. Davis,et al.  Covariance discriminative learning: A natural and efficient approach to image set classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[59]  Vittorio Murino,et al.  Log-Hilbert-Schmidt metric between positive definite operators on Hilbert spaces , 2014, NIPS.

[60]  Dario Bini,et al.  Computing the Karcher mean of symmetric positive definite matrices , 2013 .

[61]  Vassilios Morellas,et al.  Tensor Sparse Coding for Region Covariances , 2010, ECCV.

[62]  Subhransu Maji,et al.  Second-order Democratic Aggregation , 2018, ECCV.

[63]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[64]  Anoop Cherian,et al.  Higher-Order Pooling of CNN Features via Kernel Linearization for Action Recognition , 2017, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[65]  Geoffrey E. Hinton,et al.  Stochastic Neighbor Embedding , 2002, NIPS.

[66]  Vassilios Morellas,et al.  Learning Discriminative αβ-Divergences for Positive Definite Matrices , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[67]  José Mario Martínez,et al.  Algorithm 813: SPG—Software for Convex-Constrained Optimization , 2001, TOMS.

[68]  M. C. Jones,et al.  Robust and efficient estimation by minimising a density power divergence , 1998 .

[69]  Thomas Brox,et al.  Nonlinear structure tensors , 2006, Image Vis. Comput..