Improving the Identifiability of Neural Networks for Bayesian Inference
暂无分享,去创建一个
Accurate inference of the parameters in the highly complex and multi-modal likelihoods of Neural Networks(NNs) is incredibly difficult for any algorithm. In part, this challenge is caused by the significant over-parameterization of the model, resulting in many equivalent solutions and thus a model unidentifiability problem. In this paper, we explore the unidentifiability problem for NNs as it manifests in two ways: arbitrary permutations of the hidden nodes, which we denote as weight-space symmetry, and arbitrary scaling under rectified linear-unit (ReLU) nonlinearites, which we denote as scaling symmetry. We show how these unidentifiabilities pose issues for both Markov Chain Monte Carlo (MCMC) and Variational Inference (VI). Finally, we introduce two reparameterizations of the model in the form of parameter constraints and prove that they resolve the aforementioned unidentifiability issues, showing some experiments and offering implementations in the form of coordinate transforms.
[1] D. Rubinfeld,et al. Hedonic housing prices and the demand for clean air , 1978 .
[2] Richard M. Jiang,et al. General Bayesian Inference over the Stiefel Manifold via the Givens Transform , 2017 .
[3] Jiqiang Guo,et al. Stan: A Probabilistic Programming Language. , 2017, Journal of statistical software.
[4] Dustin Tran,et al. Automatic Differentiation Variational Inference , 2016, J. Mach. Learn. Res..