Learning Discriminative αβ-divergence for Positive Definite Matrices

Symmetric positive definite (SPD) matrices are useful for capturing second-order statistics of visual data. To compare two SPD matrices, several measures are available, such as the affine-invariant Riemannian metric, Jeffreys divergence, Jensen-Bregman logdet divergence, etc.; however, their behaviors may be application dependent, raising the need of manual selection to achieve the best possible performance. Further and as a result of their overwhelming complexity for large-scale problems, computing pairwise similarities by clever embedding of SPD matrices is often preferred to direct use of the aforementioned measures. In this paper, we propose a discriminative metric learning framework, Information Divergence and Dictionary Learning (IDDL), that not only learns application specific measures on SPD matrices automatically, but also embeds them as vectors using a learned dictionary. To learn the similarity measures (which could potentially be distinct for every dictionary atom), we use the recently introduced alpha-beta-logdet divergence, which is known to unify the measures listed above. We propose a novel IDDL objective, that learns the parameters of the divergence and the dictionary atoms jointly in a discriminative setup and is solved efficiently using Riemannian optimization. We showcase extensive experiments on eight computer vision datasets, demonstrating state-of-the-art performances.

[1]  Baba C. Vemuri,et al.  On A Nonlinear Generalization of Sparse Coding and Dictionary Learning , 2013, ICML.

[2]  J. Lafferty Additive models, boosting, and inference for generalized divergences , 1999, COLT '99.

[3]  Fatih Murat Porikli,et al.  Region Covariance: A Fast Descriptor for Detection and Classification , 2006, ECCV.

[4]  Mehrtash Tafazzoli Harandi,et al.  Riemannian coding and dictionary learning: Kernels to the rescue , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  N. Ayache,et al.  Log‐Euclidean metrics for fast and simple calculus on diffusion tensors , 2006, Magnetic resonance in medicine.

[6]  Luc Van Gool,et al.  A Riemannian Network for SPD Matrix Learning , 2016, AAAI.

[7]  Anoop Cherian,et al.  Riemannian Dictionary Learning and Sparse Coding for Positive Definite Matrices , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[8]  Maher Moakher,et al.  Symmetric Positive-Definite Matrices: From Geometry to Applications and Visualization , 2006, Visualization and Processing of Tensor Fields.

[9]  Cordelia Schmid,et al.  Towards Understanding Action Recognition , 2013, 2013 IEEE International Conference on Computer Vision.

[10]  Vassilios Morellas,et al.  Evaluation of feature descriptors for cancerous tissue recognition , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[11]  Mihoko Minami,et al.  Robust Blind Source Separation by Beta Divergence , 2002, Neural Computation.

[12]  Andrzej Cichocki,et al.  Nonnegative Matrix and Tensor Factorization T , 2007 .

[13]  Geoffrey E. Hinton,et al.  Stochastic Neighbor Embedding , 2002, NIPS.

[14]  Erkki Oja,et al.  Learning the Information Divergence , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Larry S. Davis,et al.  Covariance discriminative learning: A natural and efficient approach to image set classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Thomas Serre,et al.  HMDB: A large video database for human motion recognition , 2011, 2011 International Conference on Computer Vision.

[17]  Shun-ichi Amari,et al.  Methods of information geometry , 2000 .

[18]  Andrzej Cichocki,et al.  Families of Alpha- Beta- and Gamma- Divergences: Flexible and Robust Measures of Similarities , 2010, Entropy.

[19]  Robert E. Mahony,et al.  Optimization Algorithms on Matrix Manifolds , 2007 .

[20]  M. C. Jones,et al.  Robust and efficient estimation by minimising a density power divergence , 1998 .

[21]  G. Borgefors,et al.  Segmentation of virus particle candidates in transmission electron microscopy images , 2012, Journal of microscopy.

[22]  Xavier Pennec,et al.  A Riemannian Framework for Tensor Computing , 2005, International Journal of Computer Vision.

[23]  Lei Zhang,et al.  Log-Euclidean Kernels for Sparse Representation and Dictionary Learning , 2013, 2013 IEEE International Conference on Computer Vision.

[24]  Lei Wang,et al.  Beyond Covariance: Feature Representation with Nonlinear Kernel Matrices , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[25]  Vassilios Morellas,et al.  Metric learning for semi-supervised clustering of Region Covariance Descriptors , 2009, 2009 Third ACM/IEEE International Conference on Distributed Smart Cameras (ICDSC).

[26]  Anoop Cherian,et al.  Jensen-Bregman LogDet Divergence with Application to Efficient Similarity Search for Covariance Matrices , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Ali Taylan Cemgil,et al.  Learning mixed divergences in coupled matrix and tensor factorization models , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[28]  Thomas Brox,et al.  Nonlinear structure tensors , 2006, Image Vis. Comput..

[29]  Cristian Sminchisescu,et al.  Matrix Backpropagation for Deep Networks with Structured Layers , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[30]  Inderjit S. Dhillon,et al.  Learning low-rank kernel matrices , 2006, ICML.

[31]  Anoop Cherian,et al.  Higher-Order Pooling of CNN Features via Kernel Linearization for Action Recognition , 2017, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[32]  Mehrtash Tafazzoli Harandi,et al.  Bregman Divergences for Infinite Dimensional Covariance Matrices , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[33]  Andrew Zisserman,et al.  Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.

[34]  Vassilios Morellas,et al.  Tensor Sparse Coding for Region Covariances , 2010, ECCV.

[35]  Matti Pietikäinen,et al.  A comparative study of texture measures with classification based on featured distributions , 1996, Pattern Recognit..

[36]  Shiguang Shan,et al.  Log-Euclidean Metric Learning on Symmetric Positive Definite Manifold with Application to Image Set Classification , 2015, ICML.

[37]  Sergio Cruces,et al.  Optimization of Alpha-Beta Log-Det Divergences and their Application in the Spatial Filtering of Two Class Motor Imagery Movements , 2017, Entropy.

[38]  Mehrtash Tafazzoli Harandi,et al.  From Manifold to Manifold: Geometry-Aware Dimensionality Reduction for SPD Matrices , 2014, ECCV.

[39]  Duc Fehr Covariance Based Point Cloud Descriptors for Object Detection and Classification , 2013 .

[40]  Sergio Cruces,et al.  Log-Determinant Divergences Revisited: Alpha-Beta and Gamma Log-Det Divergences , 2014, Entropy.

[41]  Dieter Fox,et al.  A large-scale hierarchical multi-view RGB-D object dataset , 2011, 2011 IEEE International Conference on Robotics and Automation.

[42]  Raul Kompass,et al.  A Generalized Divergence Measure for Nonnegative Matrix Factorization , 2007, Neural Computation.

[43]  Inderjit S. Dhillon,et al.  Generalized Nonnegative Matrix Approximations with Bregman Divergences , 2005, NIPS.

[44]  Cristian Sminchisescu,et al.  Semantic Segmentation with Second-Order Pooling , 2012, ECCV.