Stochastic Maximum Likelihood Optimization via Hypernetworks

This work explores maximum likelihood optimization of neural networks through hypernetworks. A hypernetwork initializes the weights of another network, which in turn can be employed for typical functional tasks such as regression and classification. We optimize hypernetworks to directly maximize the conditional likelihood of target variables given input. Using this approach we obtain competitive empirical results on regression and classification benchmarks.

[1]  Yee Whye Teh,et al.  Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.

[2]  Alexandre Lacoste,et al.  Bayesian Hypernetworks , 2017, ArXiv.

[3]  Max Welling,et al.  Multiplicative Normalizing Flows for Variational Bayesian Neural Networks , 2017, ICML.

[4]  Tom Schaul,et al.  Efficient natural evolution strategies , 2009, GECCO.

[5]  T. Schaul,et al.  Efficient Natural Evolution Strategies Evolution Strategies and Evolutionary Programming Track , 2009 .

[6]  Ariel D. Procaccia,et al.  Variational Dropout and the Local Reparameterization Trick , 2015, NIPS.

[7]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[8]  Dustin Tran,et al.  Deep and Hierarchical Implicit Models , 2017, ArXiv.

[9]  J. Urgen Schmidhuber Learning to Control Fast-weight Memories: an Alternative to Dynamic Recurrent Networks , 1991 .

[10]  Yarin Gal,et al.  Dropout Inference in Bayesian Neural Networks with Alpha-divergences , 2017, ICML.

[11]  Ryan P. Adams,et al.  Probabilistic Backpropagation for Scalable Learning of Bayesian Neural Networks , 2015, ICML.

[12]  Alex Graves,et al.  Practical Variational Inference for Neural Networks , 2011, NIPS.

[13]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[14]  Dustin Tran,et al.  Operator Variational Inference , 2016, NIPS.

[15]  Ferenc Huszár,et al.  Variational Inference using Implicit Distributions , 2017, ArXiv.

[16]  D. Mackay,et al.  Bayesian neural networks and density networks , 1995 .

[17]  Sebastian Nowozin,et al.  Adversarial Variational Bayes: Unifying Variational Autoencoders and Generative Adversarial Networks , 2017, ICML.

[18]  Yann LeCun,et al.  Regularization of Neural Networks using DropConnect , 2013, ICML.

[19]  Quoc V. Le,et al.  HyperNetworks , 2016, ICLR.

[20]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[21]  Shakir Mohamed,et al.  Variational Inference with Normalizing Flows , 2015, ICML.

[22]  Dilin Wang,et al.  Learning to Draw Samples: With Application to Amortized MLE for Generative Adversarial Learning , 2016, ArXiv.

[23]  Julien Cornebise,et al.  Weight Uncertainty in Neural Networks , 2015, ArXiv.

[24]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[25]  Patrice Y. Simard,et al.  Best practices for convolutional neural networks applied to visual document analysis , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[26]  Dilin Wang,et al.  Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm , 2016, NIPS.

[27]  Max Welling,et al.  Structured and Efficient Variational Deep Learning with Matrix Gaussian Posteriors , 2016, ICML.