暂无分享,去创建一个
[1] Yann LeCun,et al. Towards Understanding the Role of Over-Parametrization in Generalization of Neural Networks , 2018, ArXiv.
[2] Barnabás Póczos,et al. Gradient Descent Provably Optimizes Over-parameterized Neural Networks , 2018, ICLR.
[3] Qi Lei,et al. How Fine-Tuning Allows for Effective Meta-Learning , 2021, NeurIPS.
[4] Noah A. Smith,et al. To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks , 2019, RepL4NLP@ACL.
[5] Wei Hu,et al. A Convergence Analysis of Gradient Descent for Deep Linear Neural Networks , 2018, ICLR.
[6] V. Koltchinskii,et al. Concentration inequalities and moment bounds for sample covariance operators , 2014, 1405.2468.
[7] Ruosong Wang,et al. Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks , 2019, ICML.
[8] Joel Nothman,et al. SciPy 1.0-Fundamental Algorithms for Scientific Computing in Python , 2019, ArXiv.
[9] Nathan Srebro,et al. On the Implicit Bias of Initialization Shape: Beyond Infinitesimal Mirror Descent , 2021, ICML.
[10] Matus Telgarsky,et al. Gradient descent aligns the layers of deep linear networks , 2018, ICLR.
[11] Andrea Montanari,et al. Surprises in High-Dimensional Ridgeless Least Squares Interpolation , 2019, Annals of statistics.
[12] Maria-Florina Balcan,et al. Risk Bounds for Transferring Representations With and Without Fine-Tuning , 2017, ICML.
[13] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.
[14] Sham M. Kakade,et al. Few-Shot Learning via Learning the Representation, Provably , 2020, ICLR.
[15] Ken-ichi Kawarabayashi,et al. How Neural Networks Extrapolate: From Feedforward to Graph Neural Networks , 2020, ICLR.
[16] Sanjeev Arora,et al. On the Optimization of Deep Networks: Implicit Acceleration by Overparameterization , 2018, ICML.
[17] W. Kahan,et al. The Rotation of Eigenvectors by a Perturbation. III , 1970 .
[18] Francis Bach,et al. Implicit Bias of Gradient Descent for Wide Two-layer Neural Networks Trained with the Logistic Loss , 2020, COLT.
[19] John D. Hunter,et al. Matplotlib: A 2D Graphics Environment , 2007, Computing in Science & Engineering.
[20] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.
[21] Vladimir Braverman,et al. Benign Overfitting of Constant-Stepsize SGD for Linear Regression , 2021, COLT.
[22] J. Zico Kolter,et al. Generalization in Deep Networks: The Role of Distance from Initialization , 2019, ArXiv.
[23] Philip M. Long,et al. Benign overfitting in linear regression , 2019, Proceedings of the National Academy of Sciences.
[24] Hossein Mobahi,et al. A Unifying View on Implicit Bias in Training Linear Neural Networks , 2021, ICLR.
[25] Jonathan Berant,et al. Injecting Numerical Reasoning Skills into Language Models , 2020, ACL.
[26] Sanjeev Arora,et al. Implicit Regularization in Deep Matrix Factorization , 2019, NeurIPS.
[27] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[28] Kaifeng Lyu,et al. Gradient Descent Maximizes the Margin of Homogeneous Neural Networks , 2019, ICLR.
[29] Doug Downey,et al. Don’t Stop Pretraining: Adapt Language Models to Domains and Tasks , 2020, ACL.
[30] Nathan Srebro,et al. Kernel and Rich Regimes in Overparametrized Models , 2019, COLT.
[31] Vladimir Braverman,et al. Direction Matters: On the Implicit Bias of Stochastic Gradient Descent with Moderate Learning Rate , 2020, ICLR.
[32] Hossein Mobahi,et al. Fantastic Generalization Measures and Where to Find Them , 2019, ICLR.
[33] O. Papaspiliopoulos. High-Dimensional Probability: An Introduction with Applications in Data Science , 2020 .
[34] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.
[35] K. Jarrod Millman,et al. Array programming with NumPy , 2020, Nat..
[36] A. Tsigler,et al. Benign overfitting in ridge regression , 2020 .
[37] Samet Oymak,et al. Gradient Descent with Early Stopping is Provably Robust to Label Noise for Overparameterized Neural Networks , 2019, AISTATS.
[38] Gintare Karolina Dziugaite,et al. Computing Nonvacuous Generalization Bounds for Deep (Stochastic) Neural Networks with Many More Parameters than Training Data , 2017, UAI.
[39] Nathan Srebro,et al. Characterizing Implicit Bias in Terms of Optimization Geometry , 2018, ICML.
[40] Ming-Wei Chang,et al. Latent Retrieval for Weakly Supervised Open Domain Question Answering , 2019, ACL.
[41] Amir Globerson,et al. Towards Understanding Learning in Neural Networks with Linear Teachers , 2021, ICML.
[42] Nadav Cohen,et al. Implicit Regularization in Deep Learning May Not Be Explainable by Norms , 2020, NeurIPS.
[43] Behnam Neyshabur,et al. What is being transferred in transfer learning? , 2020, NeurIPS.
[44] 知秀 柴田. 5分で分かる!? 有名論文ナナメ読み:Jacob Devlin et al. : BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding , 2020 .
[45] Chandler Davis. The rotation of eigenvectors by a perturbation , 1963 .
[46] Nathan Srebro,et al. Implicit Bias in Deep Linear Classification: Initialization Scale vs Training Accuracy , 2020, NeurIPS.