暂无分享,去创建一个
Behnam Neyshabur | Hanie Sedghi | Preetum Nakkiran | Hanie Sedghi | Behnam Neyshabur | Preetum Nakkiran
[1] Wing Shing Wong,et al. An Optimal Algorithm for Online Non-Convex Learning , 2018, SIGMETRICS.
[2] K. Jarrod Millman,et al. Array programming with NumPy , 2020, Nat..
[3] Matus Telgarsky,et al. The implicit bias of gradient descent on nonseparable data , 2019, COLT.
[4] Yoram Singer,et al. Train faster, generalize better: Stability of stochastic gradient descent , 2015, ICML.
[5] Colin Wei,et al. Improved Sample Complexities for Deep Neural Networks and Robust Classification via an All-Layer Margin , 2020, ICLR.
[6] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.
[7] Yuanzhi Li,et al. Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers , 2018, NeurIPS.
[8] Abbas Mehrabian,et al. Nearly-tight VC-dimension bounds for piecewise linear neural networks , 2017, COLT.
[9] Ryota Tomioka,et al. Norm-Based Capacity Control in Neural Networks , 2015, COLT.
[10] David Haussler,et al. Learnability and the Vapnik-Chervonenkis dimension , 1989, JACM.
[11] Michael I. Jordan,et al. Gradient Descent Only Converges to Minimizers , 2016, COLT.
[12] Nathan Srebro,et al. The Implicit Bias of Gradient Descent on Separable Data , 2017, J. Mach. Learn. Res..
[13] Simon Osindero,et al. Small Data, Big Decisions: Model Selection in the Small-Data Regime , 2020, ICML.
[14] Chuang Gan,et al. ThreeDWorld: A Platform for Interactive Multi-Modal Physical Simulation , 2020, ArXiv.
[15] Yi Yang,et al. NAS-Bench-201: Extending the Scope of Reproducible Neural Architecture Search , 2020, ICLR.
[16] Wes McKinney,et al. Data Structures for Statistical Computing in Python , 2010, SciPy.
[17] Philip M. Long,et al. Generalization bounds for deep convolutional neural networks , 2019, ICLR.
[18] Yi Zhang,et al. Stronger generalization bounds for deep nets via a compression approach , 2018, ICML.
[19] M. Kenward,et al. An Introduction to the Bootstrap , 2007 .
[20] Prateek Jain,et al. Non-convex Optimization for Machine Learning , 2017, Found. Trends Mach. Learn..
[21] R'emi Louf,et al. HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.
[22] Pieter Abbeel,et al. Denoising Diffusion Probabilistic Models , 2020, NeurIPS.
[23] Peter L. Bartlett,et al. Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..
[24] J. Zico Kolter,et al. Uniform convergence may be unable to explain generalization in deep learning , 2019, NeurIPS.
[25] David A. McAllester,et al. A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks , 2017, ICLR.
[26] Colin Raffel,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..
[27] Matthias Bethge,et al. Approximating CNNs with Bag-of-local-Features models works surprisingly well on ImageNet , 2019, ICLR.
[28] Michael I. Jordan,et al. How to Escape Saddle Points Efficiently , 2017, ICML.
[29] Matus Telgarsky,et al. Spectrally-normalized margin bounds for neural networks , 2017, NIPS.
[30] Hossein Mobahi,et al. Fantastic Generalization Measures and Where to Find Them , 2019, ICLR.
[31] Sébastien Bubeck,et al. Introduction to Online Optimization , 2011 .
[32] Ryota Tomioka,et al. In Search of the Real Inductive Bias: On the Role of Implicit Regularization in Deep Learning , 2014, ICLR.
[33] Shai Shalev-Shwartz,et al. Online Learning and Online Convex Optimization , 2012, Found. Trends Mach. Learn..
[34] Mark Chen,et al. Generative Pretraining From Pixels , 2020, ICML.
[35] Ethan Dyer,et al. Affinity and Diversity: Quantifying Mechanisms of Data Augmentation , 2020, ArXiv.
[36] Kaiming He,et al. Exploring the Limits of Weakly Supervised Pretraining , 2018, ECCV.
[37] Yann LeCun,et al. Towards Understanding the Role of Over-Parametrization in Generalization of Neural Networks , 2018, ArXiv.
[38] Xiaobo Li,et al. Online Learning with Non-Convex Losses and Non-Stationary Regret , 2018, AISTATS.
[39] Zhuowen Tu,et al. Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[40] Alec Radford,et al. Scaling Laws for Neural Language Models , 2020, ArXiv.
[41] Yiming Yang,et al. DARTS: Differentiable Architecture Search , 2018, ICLR.
[42] Nikos Komodakis,et al. Wide Residual Networks , 2016, BMVC.
[43] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[44] Mark Sandler,et al. MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[45] Ruosong Wang,et al. Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks , 2019, ICML.
[46] Graham W. Taylor,et al. Improved Regularization of Convolutional Neural Networks with Cutout , 2017, ArXiv.
[47] Chen Sun,et al. Revisiting Unreasonable Effectiveness of Data in Deep Learning Era , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[48] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[49] Mikhail Belkin,et al. To understand deep learning we need to understand kernel learning , 2018, ICML.
[50] Mikhail Belkin,et al. Overfitting or perfect fitting? Risk bounds for classification and regression rules that interpolate , 2018, NeurIPS.
[51] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[52] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .
[53] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[54] Nathan Srebro,et al. Characterizing Implicit Bias in Terms of Optimization Geometry , 2018, ICML.
[55] D. Ruppert. The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .
[56] Behnam Neyshabur,et al. Towards Learning Convolutions from Scratch , 2020, NeurIPS.
[57] Jian Sun,et al. Identity Mappings in Deep Residual Networks , 2016, ECCV.
[58] Georg Heigold,et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ArXiv.
[59] Robert Tibshirani,et al. Bootstrap Methods for Standard Errors, Confidence Intervals, and Other Measures of Statistical Accuracy , 1986 .
[60] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.
[61] Rémi Munos,et al. Online Learning in Adversarial Lipschitz Environments , 2010, ECML/PKDD.
[62] Jonathan S. Rosenfeld,et al. A Constructive Prediction of the Generalization Error Across Scales , 2020, ICLR.
[63] Kilian Q. Weinberger,et al. Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[64] Ohad Shamir,et al. Size-Independent Sample Complexity of Neural Networks , 2017, COLT.
[65] Gintare Karolina Dziugaite,et al. Computing Nonvacuous Generalization Bounds for Deep (Stochastic) Neural Networks with Many More Parameters than Training Data , 2017, UAI.
[66] Peter L. Bartlett,et al. Neural Network Learning - Theoretical Foundations , 1999 .
[67] Shai Ben-David,et al. Understanding Machine Learning: From Theory to Algorithms , 2014 .
[68] Boaz Barak,et al. Deep double descent: where bigger models and more data hurt , 2019, ICLR.
[69] Vladimir Vapnik,et al. Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .
[70] Francis Bach,et al. Implicit Bias of Gradient Descent for Wide Two-layer Neural Networks Trained with the Logistic Loss , 2020, COLT.
[71] John D. Hunter,et al. Matplotlib: A 2D Graphics Environment , 2007, Computing in Science & Engineering.
[72] Peter L. Bartlett,et al. For Valid Generalization the Size of the Weights is More Important than the Size of the Network , 1996, NIPS.
[73] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[74] Preetum Nakkiran,et al. Distributional Generalization: A New Kind of Generalization , 2020, ArXiv.
[75] Nathan Srebro,et al. Implicit Bias of Gradient Descent on Linear Convolutional Networks , 2018, NeurIPS.
[76] Elad Hazan,et al. Introduction to Online Convex Optimization , 2016, Found. Trends Optim..
[77] Trevor Hastie,et al. An Introduction to Statistical Learning , 2013, Springer Texts in Statistics.
[78] David Hinkley,et al. Bootstrap Methods: Another Look at the Jackknife , 2008 .
[79] André Elisseeff,et al. Algorithmic Stability and Generalization Performance , 2000, NIPS.
[80] Peter L. Bartlett,et al. Almost Linear VC-Dimension Bounds for Piecewise Polynomial Networks , 1998, Neural Computation.