Learning Priors for Invariance

Informative priors are often difficult, if not impossible, to elicit for modern large-scale Bayesian models. Yet, often, some prior knowledge is known, and this information is incorporated via engineering tricks or methods less principled than a Bayesian prior. However, employing these tricks is difficult to reconcile with principled probabilistic inference. For instance, in the case of data set augmentation, the posterior is conditioned on artificial data and not on what is actually observed. In this paper, we address the problem of how to specify an informative prior when the problem of interest is known to exhibit invariance properties. The proposed method is akin to posterior variational inference: we choose a parametric family and optimize to find the member of the family that makes the model robust to a given transformation. We demonstrate the method’s utility for dropout and rotation transformations, showing that the use of these priors results in performance competitive to that of non-Bayesian methods. Furthermore, our approach does not depend on the data being labeled and thus can be used in semi-supervised settings.

[1]  Stefan Roth,et al.  Learning rotation-aware features: From invariant priors to equivariant descriptors , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Christian P. Robert,et al.  The Bayesian choice , 1994 .

[3]  Jos Uffink,et al.  The constraint rule of the maximum entropy principle , 1996 .

[4]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[5]  Yoshua Bengio,et al.  An empirical evaluation of deep architectures on problems with many factors of variation , 2007, ICML '07.

[6]  Stephen Tyree,et al.  Learning with Marginalized Corrupted Features , 2013, ICML.

[7]  Sean Gerrish,et al.  Black Box Variational Inference , 2013, AISTATS.

[8]  Yuanzhi Li,et al.  Approximate maximum entropy principles via Goemans-Williamson with applications to provable variational methods , 2016, NIPS.

[9]  Ariel D. Procaccia,et al.  Variational Dropout and the Local Reparameterization Trick , 2015, NIPS.

[10]  Henry S. Baird,et al.  Document image defect models , 1995 .

[11]  David M. Blei,et al.  Variational Inference: A Review for Statisticians , 2016, ArXiv.

[12]  Christopher Potts,et al.  Learning Word Vectors for Sentiment Analysis , 2011, ACL.

[13]  Percy Liang,et al.  Data Augmentation via Levy Processes , 2016, 1603.06340.

[14]  Honglak Lee,et al.  Learning Invariant Representations with Local Transformations , 2012, ICML.

[15]  Max Welling,et al.  Learning the Irreducible Representations of Commutative Lie Groups , 2014, ICML.

[16]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[17]  Christopher D. Manning,et al.  Fast dropout training , 2013, ICML.

[18]  G. Casella An Introduction to Empirical Bayes Data Analysis , 1985 .

[19]  E. Jaynes Information Theory and Statistical Mechanics , 1957 .

[20]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[21]  Dustin Tran,et al.  Operator Variational Inference , 2016, NIPS.

[22]  Silviu Guiasu,et al.  The principle of maximum entropy , 1985 .

[23]  J. Bernardo Reference Analysis , 2005 .

[24]  Sida I. Wang,et al.  Dropout Training as Adaptive Regularization , 2013, NIPS.

[25]  Max Welling,et al.  Efficient Gradient-Based Inference through Transformations between Bayes Nets and Neural Nets , 2014, ICML.

[26]  Zoubin Ghahramani,et al.  A Theoretically Grounded Application of Dropout in Recurrent Neural Networks , 2015, NIPS.

[27]  Padhraic Smyth,et al.  Learning Approximately Objective Priors , 2017, UAI.

[28]  Stephan J. Garbin,et al.  Harmonic Networks: Deep Translation and Rotation Equivariance , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Max Welling,et al.  Group Equivariant Convolutional Networks , 2016, ICML.

[31]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[32]  H. Jeffreys An invariant form for the prior probability in estimation problems , 1946, Proceedings of the Royal Society of London. Series A. Mathematical and Physical Sciences.

[33]  L. Györfi,et al.  Nonparametric entropy estimation. An overview , 1997 .

[34]  M. L. Eaton Group invariance applications in statistics , 1989 .

[35]  Philip Bachman,et al.  Learning with Pseudo-Ensembles , 2014, NIPS.

[36]  Jingfeng Wang,et al.  Maximum entropy distributions of scale-invariant processes. , 2010, Physical review letters.

[37]  Henry S. Baird,et al.  Document image defect models and their uses , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).