[1] Edward J. Hu,et al. Tensor Programs IV: Feature Learning in Infinite-Width Neural Networks , 2021, ICML.
[2] Quoc V. Le,et al. Don't Decay the Learning Rate, Increase the Batch Size , 2017, ICLR.
[3] Roger B. Grosse,et al. A Kronecker-factored approximate Fisher matrix for convolution layers , 2016, ICML.
[4] Jaehoon Lee,et al. Deep Neural Networks as Gaussian Processes , 2017, ICLR.
[5] Richard H. Bartels,et al. Algorithm 432 [C2]: Solution of the matrix equation AX + XB = C [F4] , 1972, Commun. ACM.
[6] Guodong Zhang,et al. Noisy Natural Gradient as Variational Inference , 2017, ICML.
[7] Roger B. Grosse,et al. Optimizing Neural Networks with Kronecker-factored Approximate Curvature , 2015, ICML.
[8] Sebastian W. Ober,et al. A variational approximate posterior for the deep Wishart process , 2021, ArXiv.
[9] Héctor J. Sussmann,et al. Uniqueness of the weights for minimal feedforward nets with a given input-output map , 1992, Neural Networks.
[10] David J. C. MacKay,et al. A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.
[11] Muni S. Srivastava,et al. Singular Wishart and multivariate beta distributions , 2003 .
[12] Lawrence K. Saul,et al. Kernel Methods for Deep Learning , 2009, NIPS.
[13] H. Uhlig. On singular Wishart and singular multivariate beta distributions , 1994 .
[14] D. Kleinman. On an iterative technique for Riccati equation computations , 1968 .
[15] Heekuck Oh,et al. Neural Networks for Pattern Recognition , 1993, Adv. Comput..
[16] Guillaume Hennequin,et al. Exact natural gradient in deep linear networks and its application to the nonlinear case , 2018, NeurIPS.
[17] Leslie N. Smith,et al. Cyclical Learning Rates for Training Neural Networks , 2015, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).
[18] Richard E. Turner,et al. Gaussian Process Behaviour in Wide Deep Neural Networks , 2018, ICLR.
[19] Adam X. Yang,et al. Deep kernel processes , 2020, ICML.
[20] Neil D. Lawrence,et al. Deep Gaussian Processes , 2012, AISTATS.
[21] Tom Minka,et al. Expectation Propagation for approximate Bayesian inference , 2001, UAI.
[22] Abdulkadir Canatar,et al. Asymptotics of representation learning in finite Bayesian neural networks , 2021, ArXiv.
[23] Zoubin Ghahramani,et al. Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.
[24] James Hensman,et al. Natural Gradients in Practice: Non-Conjugate Variational Inference in Gaussian Process Models , 2018, AISTATS.
[25] Marc Peter Deisenroth,et al. Doubly Stochastic Variational Inference for Deep Gaussian Processes , 2017, NIPS.
[26] Laurence Aitchison. Why bigger is not always better: on finite and infinite neural networks , 2020, ICML.
[27] Leiba Rodman,et al. Algebraic Riccati equations , 1995 .
[28] 俊一 甘利. 5分で分かる!? 有名論文ナナメ読み:Jacot, Arthor, Gabriel, Franck and Hongler, Clement : Neural Tangent Kernel : Convergence and Generalization in Neural Networks , 2020 .
[29] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[30] Peter Benner,et al. Computational Methods for Linear-Quadratic Optimization , 1999 .
[31] J. A. Díaz-García,et al. On Wishart distribution , 2010, 1010.1799.
[32] Yanzhao Wu,et al. Demystifying Learning Rate Policies for High Accuracy Training of Deep Neural Networks , 2019, 2019 IEEE International Conference on Big Data (Big Data).
[33] Laurence Aitchison,et al. Deep Convolutional Networks as shallow Gaussian Processes , 2018, ICLR.
[34] Jaehoon Lee,et al. Bayesian Deep Convolutional Networks with Many Channels are Gaussian Processes , 2018, ICLR.
[35] Taras Bodnar,et al. Properties of the singular, inverse and generalized inverse partitioned Wishart distributions , 2008 .
[36] Dan A. Simovici,et al. Bayesian Learning , 2019, Variational Bayesian Learning Theory.
[37] Ryan P. Adams,et al. Avoiding pathologies in very deep networks , 2014, AISTATS.
[38] Philipp Hennig,et al. Descending through a Crowded Valley - Benchmarking Deep Learning Optimizers , 2020, ICML.
[39] Razvan Pascanu,et al. On the difficulty of training recurrent neural networks , 2012, ICML.