The Dynamics of Learning: A Random Matrix Approach
暂无分享,去创建一个
[1] V. Marčenko,et al. DISTRIBUTION OF EIGENVALUES FOR SOME SETS OF RANDOM MATRICES , 1967 .
[2] J. W. Silverstein,et al. Analysis of the limiting spectral distribution of large dimensional random matrices , 1995 .
[3] J. W. Silverstein,et al. No eigenvalues outside the support of the limiting spectral distribution of large-dimensional sample covariance matrices , 1998 .
[4] Vladimir N. Vapnik,et al. The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.
[5] Peter L. Bartlett,et al. Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..
[6] T. Poggio,et al. General conditions for predictivity in learning theory , 2004, Nature.
[7] S. Péché,et al. Phase transition of the largest eigenvalue for nonnull complex sample covariance matrices , 2004, math/0403022.
[8] Z. Bai,et al. CLT for linear spectral statistics of large dimensional sample covariance matrices with dependent data , 2017, Statistical Papers.
[9] Yann LeCun,et al. The mnist database of handwritten digits , 2005 .
[10] Radford M. Neal. Pattern Recognition and Machine Learning , 2007, Technometrics.
[11] W. Hachem,et al. Deterministic equivalents for certain functionals of large random matrices , 2005, math/0507172.
[12] Y. Yao,et al. On Early Stopping in Gradient Descent Learning , 2007 .
[13] J. W. Silverstein,et al. Spectral Analysis of Large Dimensional Random Matrices , 2009 .
[14] Raj Rao Nadakuditi,et al. The eigenvalues and eigenvectors of finite, low rank perturbations of large random matrices , 2009, 0910.2120.
[15] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.
[16] Léon Bottou,et al. Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.
[17] R. Couillet,et al. Random Matrix Methods for Wireless Communications: Estimation , 2011 .
[18] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[19] Технология. Springer Science+Business Media , 2013 .
[20] J. Norris. Appendix: probability and measure , 1997 .
[21] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..
[22] Surya Ganguli,et al. Exact solutions to the nonlinear dynamics of learning in deep linear neural networks , 2013, ICLR.
[23] Jürgen Schmidhuber,et al. Deep learning in neural networks: An overview , 2014, Neural Networks.
[24] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[25] R. Couillet,et al. Spectral analysis of the Gram matrix of mixture models , 2015, 1510.03463.
[26] Yann LeCun,et al. The Loss Surfaces of Multilayer Networks , 2014, AISTATS.
[27] Kerstin Vogler,et al. Table Of Integrals Series And Products , 2016 .
[28] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.
[29] Jorge Nocedal,et al. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.
[30] Andrew M. Saxe,et al. High-dimensional dynamics of generalization error in neural networks , 2017, Neural Networks.