论文信息 - Approximate Empirical Bayes for Deep Neural Networks - 字舞流文

Approximate Empirical Bayes for Deep Neural Networks

min W,a min Ωr,Ωc 1 2 |ŷ(W, a)− y|2 + λ||Ω1/2 r WΩ1/2 c ||F − λ (d log det(Ωr) + p log det(Ωc)) subject to uIp Ωr vIp, uId Ωc vId • Ωr := Σ−1 c and Ωc := Σ−1 c are the corresponding precision matrices. Block Coordinate Ascent Algorithm: Input: Initial value w0 := {a(0),W (0)}, Ω(0) r and Ω(0) c , first-order optimization algorithm A, constants 0 < u ≤ v. 1: for t = 1, . . . ,∞ until convergence do 2: Fix Ω(t−1) r , Ω(t−1) c , optimize w(t) by backpropagation and algorithm A 3: Ω(t) r ← InvThresholding(W (t)Ω(t−1) c W (t)T , d, u, v) 4: Ω(t) c ← InvThresholding(W (t)TΩ(t) r W (t), p, u, v) 5: end for

Geoffrey J. Gordon | Yao-Hung Hubert Tsai | R. Salakhutdinov | Han Zhao

[1] Han Zhao,et al. Efficient Multitask Feature and Relationship Learning , 2017, UAI.

[2] Thomas L. Griffiths,et al. Recasting Gradient-Based Meta-Learning as Hierarchical Bayes , 2018, ICLR.

[3] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.

[4] Trevor Hastie,et al. Computer Age Statistical Inference by Bradley Efron , 2016 .

[5] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6] Ross B. Girshick,et al. Reducing Overfitting in Deep Networks by Decorrelating Representations , 2015, ICLR.

[7] Geoffrey E. Hinton,et al. Deep Learning , 2015, Nature.

[8] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[9] Isaac Dialsingh,et al. Large-scale inference: empirical Bayes methods for estimation, testing, and prediction , 2012 .

[10] Geoffrey E. Hinton,et al. Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[11] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .

[12] J. Neyman,et al. INADMISSIBILITY OF THE USUAL ESTIMATOR FOR THE MEAN OF A MULTIVARIATE NORMAL DISTRIBUTION , 2005 .

[13] Rich Caruana,et al. Multitask Learning , 1997, Machine Learning.

[14] John D. Storey,et al. Empirical Bayes Analysis of a Microarray Experiment , 2001 .

[15] Stefan Schaal,et al. Locally Weighted Projection Regression: Incremental Real Time Learning in High Dimensional Space , 2000, ICML.

[16] Rich Caruana,et al. Overfitting in Neural Nets: Backpropagation, Conjugate Gradient, and Early Stopping , 2000, NIPS.

[17] A. Rukhin. Matrix Variate Distributions , 1999, The Multivariate Normal Distribution.

[18] A. Rukhin. Bayes and Empirical Bayes Methods for Data Analysis , 1997 .

[19] David B. Dunson,et al. Bayesian Data Analysis , 2010 .

[20] C. Stein,et al. Estimation with Quadratic Loss , 1992 .

[21] Anders Krogh,et al. A Simple Weight Decay Can Improve Generalization , 1991, NIPS.

[22] B. Efron,et al. Stein's Paradox in Statistics , 1977 .

[23] B. Efron,et al. Stein's Estimation Rule and Its Competitors- An Empirical Bayes Approach , 1973 .

[24] H. Robbins. An Empirical Bayes Approach to Statistics , 1956 .