暂无分享,去创建一个
[1] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[2] Yoshua Bengio,et al. How transferable are features in deep neural networks? , 2014, NIPS.
[3] Shai Shalev-Shwartz,et al. SGD Learns Over-parameterized Networks that Provably Generalize on Linearly Separable Data , 2017, ICLR.
[4] Noel E. O'Connor,et al. Unsupervised label noise modeling and loss correction , 2019, ICML.
[5] Yu Feng,et al. How neural networks find generalizable solutions: Self-tuned annealing in deep learning , 2020, ArXiv.
[6] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.
[7] Florent Krzakala,et al. Generalisation error in learning with random features and the hidden manifold model , 2020, ICML.
[8] George Kurian,et al. Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.
[9] Léon Bottou,et al. Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.
[10] Yoshua Bengio,et al. A Closer Look at Memorization in Deep Networks , 2017, ICML.
[11] Naftali Tishby,et al. Deep learning and the information bottleneck principle , 2015, 2015 IEEE Information Theory Workshop (ITW).
[12] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.
[13] Mark B. Ring. Continual learning in reinforcement environments , 1995, GMD-Bericht.
[14] Marc'Aurelio Ranzato,et al. Gradient Episodic Memory for Continual Learning , 2017, NIPS.
[15] Samet Oymak,et al. Gradient Descent with Early Stopping is Provably Robust to Label Noise for Overparameterized Neural Networks , 2019, AISTATS.
[16] Gerald Tesauro,et al. Learning to Learn without Forgetting By Maximizing Transfer and Minimizing Interference , 2018, ICLR.
[17] Mikhail Belkin,et al. Reconciling modern machine-learning practice and the classical bias–variance trade-off , 2018, Proceedings of the National Academy of Sciences.
[18] Dieter Forster,et al. Hydrodynamic fluctuations, broken symmetry, and correlation functions , 1975 .
[19] Ewen Callaway,et al. ‘It will change everything’: DeepMind’s AI makes gigantic leap in solving protein structures , 2020, Nature.
[20] Stefano Soatto,et al. Stochastic Gradient Descent Performs Variational Inference, Converges to Limit Cycles for Deep Networks , 2017, 2018 Information Theory and Applications Workshop (ITA).
[21] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..
[22] Yao Zhang,et al. Energy–entropy competition and the effectiveness of stochastic gradient descent in machine learning , 2018, Molecular Physics.
[23] Yuanzhi Li,et al. Learning Overparameterized Neural Networks via Stochastic Gradient Descent on Structured Data , 2018, NeurIPS.
[24] Andrew M. Saxe,et al. High-dimensional dynamics of generalization error in neural networks , 2017, Neural Networks.
[25] Jason Weston,et al. Curriculum learning , 2009, ICML '09.
[26] Hossein Mobahi,et al. Fantastic Generalization Measures and Where to Find Them , 2019, ICLR.
[27] Levent Sagun,et al. Scaling description of generalization with number of parameters in deep learning , 2019, Journal of Statistical Mechanics: Theory and Experiment.
[28] Andrea Montanari,et al. The Generalization Error of Random Features Regression: Precise Asymptotics and the Double Descent Curve , 2019, Communications on Pure and Applied Mathematics.
[29] Naftali Tishby,et al. Opening the Black Box of Deep Neural Networks via Information , 2017, ArXiv.
[30] H. Robbins. A Stochastic Approximation Method , 1951 .
[31] Nathan Srebro,et al. Exploring Generalization in Deep Learning , 2017, NIPS.