Deep Learning Model Selection of Suboptimal Complexity

We consider the problem of model selection for deep learning models of suboptimal complexity. The complexity of a model is understood as the minimum description length of the combination of the sample and the classification or regression model. Suboptimal complexity is understood as an approximate estimate of the minimum description length, obtained with Bayesian inference and variational methods. We introduce probabilistic assumptions about the distribution of parameters. Based on Bayesian inference, we propose the likelihood function of the model. To obtain an estimate for the likelihood, we apply variational methods with gradient optimization algorithms. We perform a computational experiment on several samples.

[1]  Yee Whye Teh,et al.  Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.

[2]  Ryan P. Adams,et al.  Early Stopping as Nonparametric Variational Inference , 2015, AISTATS.

[3]  Peter Grünwald,et al.  A tutorial introduction to the minimum description length principle , 2004, ArXiv.

[4]  Geoffrey E. Hinton,et al.  Learning a Nonlinear Embedding by Preserving Class Neighbourhood Structure , 2007, AISTATS.

[5]  Lawrence Carin,et al.  Preconditioned Stochastic Gradient Langevin Dynamics for Deep Neural Networks , 2015, AAAI.

[6]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[7]  Michael I. Jordan,et al.  Gradient Descent Converges to Minimizers , 2016, ArXiv.

[8]  Alex Graves,et al.  Practical Variational Inference for Neural Networks , 2011, NIPS.

[9]  Benjamin W. Wah,et al.  Global Optimization for Neural Network Training , 1996, Computer.

[10]  Ryan P. Adams,et al.  Firefly Monte Carlo: Exact MCMC with Subsets of Data , 2014, UAI.

[11]  Hiroshi Nakagawa,et al.  Approximation Analysis of Stochastic Gradient Langevin Dynamics by using Fokker-Planck Equation and Ito Process , 2014, ICML.

[12]  Ryan P. Adams,et al.  Probabilistic Backpropagation for Scalable Learning of Bayesian Neural Networks , 2015, ICML.

[13]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[14]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[15]  Ryan P. Adams,et al.  Gradient-based Hyperparameter Optimization through Reversible Learning , 2015, ICML.

[16]  Amir Dembo,et al.  Information theoretic inequalities , 1991, IEEE Trans. Inf. Theory.

[17]  Vadim V. Strijov,et al.  Analytic and Stochastic Methods of Structure Parameter Estimation , 2016, Informatica.