Principled Weight Initialization for Hypernetworks

Hypernetworks are meta neural networks that generate weights for a main neural network in an end-to-end differentiable manner. Despite extensive applications ranging from multi-task learning to Bayesian deep learning, the problem of optimizing hypernetworks has not been studied to date. We observe that classical weight initialization methods like Glorot & Bengio (2010) and He et al. (2015), when applied directly on a hypernet, fail to produce weights for the mainnet in the correct scale. We develop principled techniques for weight initialization in hypernets, and show that they lead to more stable mainnet weights, lower training loss, and faster convergence.

[1]  Agustinus Kristiadi,et al.  Predictive Uncertainty Quantification with Compound Density Networks , 2019, ArXiv.

[2]  Kenneth O. Stanley,et al.  A Hypercube-Based Encoding for Evolving Large-Scale Neural Networks , 2009, Artificial Life.

[3]  Timothy M. Hospedales,et al.  Hypernetwork Knowledge Graph Embeddings , 2018, ICANN.

[4]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[5]  Fuxin Li,et al.  HyperGAN: A Generative Model for Diverse, Performant Neural Networks , 2019, ICML.

[6]  Takayuki Okatani,et al.  HyperNetworks with statistical filtering for defending adversarial examples , 2017, ArXiv.

[7]  Xiangyu Zhang,et al.  MetaPruning: Meta Learning for Automatic Neural Network Channel Pruning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[8]  Jürgen Schmidhuber,et al.  Evolving neural networks in compressed weight space , 2010, GECCO '10.

[9]  Jacek Tabor,et al.  Hypernetwork Functional Image Representation , 2019, ICANN.

[10]  Elliot Meyerson,et al.  Modular Universal Reparameterization: Deep Multi-task Learning Across Diverse Domains , 2019, NeurIPS.

[11]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[12]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[13]  Thomas Brox,et al.  Striving for Simplicity: The All Convolutional Net , 2014, ICLR.

[14]  Benjamin F. Grewe,et al.  Continual learning with hypernetworks , 2019, ICLR.

[15]  Benjamin F. Grewe,et al.  Approximating the Predictive Distribution via Adversarially-Trained Hypernetworks , 2018 .

[16]  Joseph Suarez,et al.  Language Modeling with Recurrent Highway Hypernetworks , 2017, NIPS.

[17]  Yoshua Bengio,et al.  Greedy Layer-Wise Training of Deep Networks , 2006, NIPS.

[18]  Tengyu Ma,et al.  Fixup Initialization: Residual Learning Without Normalization , 2019, ICLR.

[19]  Raquel Urtasun,et al.  Graph HyperNetworks for Neural Architecture Search , 2018, ICLR.

[20]  Surya Ganguli,et al.  Exact solutions to the nonlinear dynamics of learning in deep linear neural networks , 2013, ICLR.

[21]  Takashi Matsubara,et al.  Hypernetwork-based Implicit Posterior Estimation and Model Averaging of CNN , 2018, ACML.

[22]  Theodore Lim,et al.  SMASH: One-Shot Model Architecture Search through HyperNetworks , 2017, ICLR.

[23]  Joan Serra,et al.  Blow: a single-scale hyperconditioned flow for non-parallel raw-audio voice conversion , 2019, NeurIPS.

[24]  Nathan Srebro,et al.  The Marginal Value of Adaptive Gradient Methods in Machine Learning , 2017, NIPS.

[25]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[26]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[27]  Sören Laue,et al.  Computing Higher Order Derivatives of Matrix and Tensor Expressions , 2018, NeurIPS.

[28]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[29]  Shuicheng Yan,et al.  Meta Networks for Neural Style Transfer , 2017, ArXiv.

[30]  Sanjiv Kumar,et al.  On the Convergence of Adam and Beyond , 2018 .

[31]  Yong Yu,et al.  HyperST-Net: Hypernetworks for Spatio-Temporal Forecasting , 2018, ArXiv.

[32]  Ben Glocker,et al.  Implicit Weight Uncertainty in Neural Networks. , 2017 .

[33]  Erik Nijkamp,et al.  A Generative Model for Sampling High-Performance and Diverse Weights for Neural Networks , 2019, ArXiv.

[34]  David Duvenaud,et al.  Stochastic Hyperparameter Optimization through Hypernetworks , 2018, ArXiv.