暂无分享,去创建一个
Boaz Barak | Ilya Sutskever | Gal Kaplun | Preetum Nakkiran | Yamini Bansal | Tristan Yang | B. Barak | Ilya Sutskever | Yamini Bansal | Preetum Nakkiran | Gal Kaplun | Tristan Yang | I. Sutskever
[1] J. L. Graham. Learning to generalize. , 1938 .
[2] Gerard V. Trunk,et al. A Problem of Dimensionality: A Simple Example , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[3] R. Duin. Small sample size generalization , 1995 .
[4] Robert P. W. Duin,et al. Classifiers in almost empty spaces , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.
[5] M. Opper. Statistical Mechanics of Learning : Generalization , 2002 .
[6] Robert P. W. Duin,et al. Bagging, Boosting and the Random Subspace Method for Linear Classifiers , 2002, Pattern Analysis & Applications.
[7] Benjamin Recht,et al. Random Features for Large-Scale Kernel Machines , 2007, NIPS.
[8] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .
[9] Robert Tibshirani,et al. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.
[10] Mauro Cettolo,et al. WIT3: Web Inventory of Transcribed and Translated Talks , 2012, EAMT.
[11] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[12] Robert P. W. Duin,et al. The Dipping Phenomenon , 2012, SSPR/SPR.
[13] Jonathon Shlens,et al. Explaining and Harnessing Adversarial Examples , 2014, ICLR.
[14] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[15] Rico Sennrich,et al. Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.
[16] Jian Sun,et al. Identity Mappings in Deep Residual Networks , 2016, ECCV.
[17] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[18] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.
[19] Luca Antiga,et al. Automatic differentiation in PyTorch , 2017 .
[20] Aleksander Madry,et al. Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.
[21] Mikhail Belkin,et al. Reconciling modern machine learning and the bias-variance trade-off , 2018, ArXiv.
[22] Mikhail Belkin,et al. Reconciling modern machine-learning practice and the classical bias–variance trade-off , 2018, Proceedings of the National Academy of Sciences.
[23] Levent Sagun,et al. A jamming transition from under- to over-parametrization affects generalization in deep learning , 2018, Journal of Physics A: Mathematical and Theoretical.
[24] Meir Feder,et al. A New Look at an Old Problem: A Universal Learning Approach to Linear Regression , 2019, 2019 IEEE International Symposium on Information Theory (ISIT).
[25] Fred Zhang,et al. SGD on Neural Networks Learns Functions of Increasing Complexity , 2019, NeurIPS.
[26] Levent Sagun,et al. The jamming transition as a paradigm to understand the loss landscape of deep neural networks , 2018, Physical review. E.
[27] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .
[28] Partha P Mitra,et al. Understanding overfitting peaks in generalization error: Analytical risk curves for l2 and l1 penalized interpolation , 2019, ArXiv.
[29] Anant Sahai,et al. Harmless interpolation of noisy data in regression , 2019, 2019 IEEE International Symposium on Information Theory (ISIT).
[30] Quoc V. Le,et al. GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism , 2018, ArXiv.
[31] André Longtin,et al. Learning to generalize , 2019, eLife.
[32] Andrea Montanari,et al. Surprises in High-Dimensional Ridgeless Least Squares Interpolation , 2019, Annals of statistics.
[33] Myle Ott,et al. fairseq: A Fast, Extensible Toolkit for Sequence Modeling , 2019, NAACL.
[34] Levent Sagun,et al. Scaling description of generalization with number of parameters in deep learning , 2019, Journal of Statistical Mechanics: Theory and Experiment.
[35] Andrew M. Saxe,et al. High-dimensional dynamics of generalization error in neural networks , 2017, Neural Networks.
[36] Philip M. Long,et al. Benign overfitting in linear regression , 2019, Proceedings of the National Academy of Sciences.
[37] Mikhail Belkin,et al. Two models of double descent for weak features , 2019, SIAM J. Math. Data Sci..
[38] Andrea Montanari,et al. The Generalization Error of Random Features Regression: Precise Asymptotics and the Double Descent Curve , 2019, Communications on Pure and Applied Mathematics.