q-Paths: Generalizing the Geometric Annealing Path using Power Means

Many common machine learning methods involve the geometric annealing path, a sequence of intermediate densities between two distributions of interest constructed using the geometric average. While alternatives such as the moment-averaging path have demonstrated performance gains in some settings, their practical applicability remains limited by exponential family endpoint assumptions and a lack of closed form energy function. In this work, we introduce q-paths, a family of paths which is derived from a generalized notion of the mean, includes the geometric and arithmetic mixtures as special cases, and admits a simple closed form involving the deformed logarithm function from nonextensive thermodynamics. Following previous analysis of the geometric path, we interpret our q-paths as corresponding to a qexponential family of distributions, and provide a variational representation of intermediate densities as minimizing a mixture of α-divergences to the endpoints. We show that small deviations away from the geometric path yield empirical gains for Bayesian inference using Sequential Monte Carlo and generative model evaluation using Annealed Importance Sampling.

[1]  Manfred K. Warmuth,et al.  Two-temperature logistic regression based on the Tsallis divergence , 2017, AISTATS.

[2]  Nicolas Chopin,et al.  Sequential Monte Carlo on large binary sampling spaces , 2011, Statistics and Computing.

[3]  Michael W Deem,et al.  Parallel tempering: theory, applications, and new perspectives. , 2005, Physical chemistry chemical physics : PCCP.

[4]  Jean-François Bercher,et al.  A new look at q-exponential distributions via excess statistics , 2008 .

[5]  Radford M. Neal Annealed importance sampling , 1998, Stat. Comput..

[6]  Frank D. Wood,et al.  All in the Exponential Family: Bregman Duality in Thermodynamic Variational Inference , 2020, ICML.

[7]  Frank Nielsen,et al.  Likelihood Ratio Exponential Families , 2020, ArXiv.

[8]  Gareth W. Peters,et al.  Efficient Sequential Monte-Carlo Samplers for Bayesian Inference , 2015, IEEE Transactions on Signal Processing.

[9]  Ruslan Salakhutdinov,et al.  On the Quantitative Analysis of Decoder-Based Generative Models , 2016, ICLR.

[10]  I JordanMichael,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008 .

[11]  Yuan Qi,et al.  t-divergence Based Approximate Inference , 2011, NIPS.

[12]  Carey E. Priebe,et al.  Journal of the American Statistical Association Maximum Lq-likelihood Estimation via the Expectation-maximization Algorithm: a Robust Estimation of Mixture Models Maximum Lq-likelihood Estimation via the Expectation-maximization Algorithm: a Robust Estimation of Mixture Models , 2022 .

[13]  Arnaud Doucet,et al.  Inference for Lévy‐Driven Stochastic Volatility Models via Adaptive Sequential Monte Carlo , 2011 .

[14]  Thomas P. Minka,et al.  Divergence measures and message passing , 2005 .

[15]  Miguel de Carvalho,et al.  Mean, What do You Mean? , 2016 .

[16]  P. Moral,et al.  Sequential Monte Carlo samplers , 2002, cond-mat/0212648.

[17]  Taisuke Kobayashi q-VAE for Disentangled Representation Learning and Latent Dynamical Systems , 2020, IEEE Robotics Autom. Lett..

[18]  John K Kruschke,et al.  Bayesian data analysis. , 2010, Wiley interdisciplinary reviews. Cognitive science.

[19]  F. Opitz Information geometry and its applications , 2012, 2012 9th European Radar Conference.

[20]  Antonio Maria Scarfone,et al.  Normalization Problems for Deformed Exponential Families , 2019, GSI.

[21]  Hiroshi Matsuzoe,et al.  Advantages of q-logarithm representation over q-exponential representation from the sense of scale and shift on nonlinear systems , 2019 .

[22]  N. Chopin,et al.  Adaptive Tuning Of Hamiltonian Monte Carlo Within Sequential Monte Carlo , 2018, 1808.07730.

[23]  A. Pettitt,et al.  Marginal likelihood estimation via power posteriors , 2008 .

[24]  Shun-ichi Amari,et al.  Geometry of q-Exponential Family of Probability Distributions , 2011, Entropy.

[25]  Y. Ogata A Monte Carlo method for high dimensional integration , 1989 .

[26]  Radford M. Neal MCMC Using Hamiltonian Dynamics , 2011, 1206.1901.

[27]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[28]  C. Tsallis,et al.  Nonextensive Entropy: Interdisciplinary Applications , 2004 .

[29]  A. Gelman,et al.  Pareto Smoothed Importance Sampling , 2015, J. Mach. Learn. Res..

[30]  Frank D. Wood,et al.  The Thermodynamic Variational Objective , 2019, NeurIPS.

[31]  Inderjit S. Dhillon,et al.  Clustering with Bregman Divergences , 2005, J. Mach. Learn. Res..

[32]  Aki Vehtari,et al.  Yes, but Did It Work?: Evaluating Variational Inference , 2018, ICML.

[33]  Frank Nielsen On the Jensen–Shannon Symmetrization of Distances Relying on Abstract Means , 2019, Entropy.

[34]  Masashi Sugiyama,et al.  Expectation Propagation for t-Exponential Family Using q-Algebra , 2017, NIPS.

[35]  Ruslan Salakhutdinov,et al.  Annealing between distributions by averaging moments , 2013, NIPS.

[36]  Nicolas Chopin,et al.  An Introduction to Sequential Monte Carlo , 2020 .

[37]  J. Naudts The q-exponential family in statistical physics , 2008, 0809.4764.

[38]  C. Tsallis Introduction to Nonextensive Statistical Mechanics: Approaching a Complex World , 2009 .

[39]  V. M. Tikhomirov,et al.  On the Notion of Mean , 1991 .

[40]  Xiao-Li Meng,et al.  Simulating Normalizing Constants: From Importance Sampling to Bridge Sampling to Path Sampling , 1998 .

[41]  Jorma Rissanen,et al.  Minimum Description Length Principle , 2010, Encyclopedia of Machine Learning.

[42]  J. Pickands Statistical Inference Using Extreme Order Statistics , 1975 .

[43]  C. Priebe,et al.  Robust Hypothesis Testing via Lq-Likelihood , 2013, 1310.7278.

[44]  S. Amari Integration of Stochastic Models by Minimizing -Divergence , 2007, Neural Computation.

[45]  Ryan P. Adams,et al.  Sandwiching the marginal likelihood using bidirectional Monte Carlo , 2015, ArXiv.

[46]  Jacob Deasy,et al.  Constraining Variational Inference with Geometric Jensen-Shannon Divergence , 2020, NeurIPS.