Calculating Optimistic Likelihoods Using (Geodesically) Convex Optimization

A fundamental problem arising in many areas of machine learning is the evaluation of the likelihood of a given observation under different nominal distributions. Frequently, these nominal distributions are themselves estimated from data, which makes them susceptible to estimation errors. We thus propose to replace each nominal distribution with an ambiguity set containing all distributions in its vicinity and to evaluate an optimistic likelihood, that is, the maximum of the likelihood over all distributions in the ambiguity set. When the proximity of distributions is quantified by the Fisher-Rao distance or the Kullback-Leibler divergence, the emerging optimistic likelihoods can be computed efficiently using either geodesic or standard convex optimization techniques. We showcase the advantages of working with optimistic likelihoods on a classification problem using synthetic as well as empirical data.

[1]  Olivier Ledoit,et al.  A well-conditioned estimator for large-dimensional covariance matrices , 2004 .

[2]  Suvrit Sra,et al.  First-order Methods for Geodesically Convex Optimization , 2016, COLT.

[3]  Jinbo Bi,et al.  Support Vector Classification with Input Data Uncertainty , 2004, NIPS.

[4]  Suvrit Sra,et al.  Conic Geometric Optimization on the Manifold of Positive Definite Matrices , 2013, SIAM J. Optim..

[5]  Levent Tunçel,et al.  Optimization algorithms on matrix manifolds , 2009, Math. Comput..

[6]  R. Tibshirani,et al.  Sparse inverse covariance estimation with the graphical lasso. , 2008, Biostatistics.

[7]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[8]  C. Atkinson Rao's distance measure , 1981 .

[9]  Silvere Bonnabel,et al.  Riemannian Metric and Geometric Mean for Positive Semidefinite Matrices of Fixed Rank , 2008, SIAM J. Matrix Anal. Appl..

[10]  David R. Cox,et al.  A return to an old paper: ‘Tests of separate families of hypotheses’ , 2013 .

[11]  Andreas Krause,et al.  Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting , 2009, IEEE Transactions on Information Theory.

[12]  Suvrit Sra,et al.  Fast stochastic optimization on Riemannian manifolds , 2016, ArXiv.

[13]  Brigitte Maier,et al.  Fundamentals Of Differential Geometry , 2016 .

[14]  Hongdong Li,et al.  Kernel Methods on the Riemannian Manifold of Symmetric Positive Definite Matrices , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Marc Arnaudon,et al.  Riemannian Medians and Means With Applications to Radar Signal Processing , 2013, IEEE Journal of Selected Topics in Signal Processing.

[16]  N. Čencov Statistical Decision Rules and Optimal Inference , 2000 .

[17]  Akiko Takeda,et al.  Optimistic Robust Optimization With Applications To Machine Learning , 2017, ArXiv.

[18]  L. Campbell An extended Čencov characterization of the information metric , 1986 .

[19]  D. Cox Tests of Separate Families of Hypotheses , 1961 .

[20]  Asuka Takatsu Wasserstein geometry of Gaussian measures , 2011 .

[21]  S. Wood Statistical inference for noisy nonlinear ecological dynamic systems , 2010, Nature.

[22]  David T. Frazier,et al.  Bayesian Synthetic Likelihood , 2017, 2305.05120.

[23]  C. Villani Optimal Transport: Old and New , 2008 .

[24]  Nando de Freitas,et al.  A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning , 2010, ArXiv.

[25]  Teng Zhang Robust subspace recovery by geodesically convex optimization , 2012, 1206.1386.

[26]  G. McLachlan Discriminant Analysis and Statistical Pattern Recognition , 1992 .

[27]  L. Skovgaard A Riemannian geometry of the multivariate normal model , 1984 .

[28]  Jonathan H. Manton,et al.  Riemannian Gaussian Distributions on the Space of Symmetric Positive Definite Matrices , 2015, IEEE Transactions on Information Theory.

[29]  Michael I. Jordan,et al.  Averaging Stochastic Gradient Descent on Riemannian Manifolds , 2018, COLT.

[30]  Ronald Schoenberg,et al.  Constrained Maximum Likelihood , 1997 .

[31]  D. Bernstein Matrix Mathematics: Theory, Facts, and Formulas , 2009 .

[32]  Martin Bauer,et al.  Uniqueness of the Fisher–Rao metric on the space of smooth densities , 2014, 1411.5577.

[33]  John M. Lee Riemannian Manifolds: An Introduction to Curvature , 1997 .

[34]  John M. Lee Introduction to Smooth Manifolds , 2002 .

[35]  M. Bridson,et al.  Metric Spaces of Non-Positive Curvature , 1999 .

[36]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[37]  Nisheeth K. Vishnoi,et al.  On Geodesically Convex Formulations for the Brascamp-Lieb Constant , 2018, APPROX-RANDOM.

[38]  Nicolas Boumal,et al.  Simple Algorithms for Optimization on Riemannian Manifolds with Constraints , 2019, Applied Mathematics & Optimization.

[39]  Jefferson G. Melo,et al.  Iteration-Complexity of Gradient, Subgradient and Proximal Point Methods on Riemannian Manifolds , 2016, Journal of Optimization Theory and Applications.

[40]  Richard P. Savage The space of positive definite matrices and Gromov’s invariant , 1982 .

[41]  Sébastien Bubeck,et al.  Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..

[42]  Xavier Pennec,et al.  A Riemannian Framework for Tensor Computing , 2005, International Journal of Computer Vision.

[43]  R. Bhatia Positive Definite Matrices , 2007 .

[44]  Kim C. Border,et al.  Infinite Dimensional Analysis: A Hitchhiker’s Guide , 1994 .

[45]  Suvrit Sra,et al.  An Estimate Sequence for Geodesically Convex Optimization , 2018, COLT.

[46]  Wolfram Wiesemann,et al.  Optimistic Distributionally Robust Optimization for Nonparametric Likelihood Approximation , 2019, NeurIPS.

[47]  R. Kass,et al.  Geometrical Foundations of Asymptotic Inference , 1997 .

[48]  Daniel Kuhn,et al.  Distributionally Robust Inverse Covariance Estimation: The Wasserstein Shrinkage Estimator , 2018, Oper. Res..

[49]  Rémi Munos,et al.  From Bandits to Monte-Carlo Tree Search: The Optimistic Principle Applied to Optimization and Planning , 2014, Found. Trends Mach. Learn..

[50]  L. F. Prudente,et al.  Gradient Method for Optimization on Riemannian Manifolds with Lower Bounded Curvature , 2019, SIAM J. Optim..

[51]  Charles E. Heckler,et al.  Applied Multivariate Statistical Analysis , 2005, Technometrics.

[52]  Fatih Murat Porikli,et al.  Pedestrian Detection via Classification on Riemannian Manifolds , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.