Deep Prior

The recent literature on deep learning offers new tools to learn a rich probability distribution over high dimensional data such as images or sounds. In this work we investigate the possibility of learning the prior distribution over neural network parameters using such tools. Our resulting variational Bayes algorithm generalizes well to new tasks, even when very few training examples are provided. Furthermore, this learned prior allows the model to extrapolate correctly far from a given task’s training data on a meta-dataset of periodic signals. 1 Learning a Rich Prior Bayesian Neural Networks [1, 2, 3, 4] are now scalable and can be used to estimate prediction uncertainty and model uncertainty [5]. While many efforts focus on better approximation of the posterior, we believe that the quality of the uncertainty highly depends on the choice of the prior. Hence, we consider learning a prior from previous tasks by learning a probability distribution p(w|α) over the weights w of a network, parameterized by α, and leveraging this learned prior to reduce sample complexity on new tasks. More formally we consider a hierarchical Bayes approach across N tasks, with hyper-prior p(α). Each task has its own parameters wj , withW = {wj}j=1. Using all datasets D = {Sj}j=1, we have the following posterior:1 p(W, α|D) = p(α|D) ∏ j p(wj |α, Sj) ∝ p(D|W)p(W|α)p(α) ∝ ∏