暂无分享,去创建一个
Yann Dauphin | Jeffrey Pennington | Atish Agarwala | Sam Schoenholz | Yann Dauphin | Jeffrey Pennington | S. Schoenholz | Atish Agarwala | Y. Dauphin
[1] Jaehoon Lee,et al. Bayesian Deep Convolutional Networks with Many Channels are Gaussian Processes , 2018, ICLR.
[2] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[3] Laurence Aitchison. Why bigger is not always better: on finite and infinite neural networks , 2020, ICML.
[4] Samuel S. Schoenholz,et al. Disentangling trainability and generalization in deep learning , 2019, ArXiv.
[5] Arthur Jacot,et al. Neural tangent kernel: convergence and generalization in neural networks (invited paper) , 2018, NeurIPS.
[6] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[7] Hao-Yu Wu,et al. Classification is a Strong Baseline for Deep Metric Learning , 2018, BMVC.
[8] Jascha Sohl-Dickstein,et al. The large learning rate phase of deep learning: the catapult mechanism , 2020, ArXiv.
[9] Francis Bach,et al. On Lazy Training in Differentiable Programming , 2018, NeurIPS.
[10] Surya Ganguli,et al. Reverse engineering recurrent networks for sentiment classification reveals line attractor dynamics , 2019, NeurIPS.
[11] Jaehoon Lee,et al. Neural Tangents: Fast and Easy Infinite Neural Networks in Python , 2019, ICLR.
[12] Quoc V. Le,et al. DropBlock: A regularization method for convolutional networks , 2018, NeurIPS.
[13] Samuel S. Schoenholz,et al. Disentangling Trainability and Generalization in Deep Neural Networks , 2020, ICML.
[14] Geoffrey E. Hinton,et al. When Does Label Smoothing Help? , 2019, NeurIPS.
[15] Francis Bach,et al. Implicit Bias of Gradient Descent for Wide Two-layer Neural Networks Trained with the Logistic Loss , 2020, COLT.
[16] Yoshua Bengio,et al. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.
[17] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.
[18] Hermann Ney,et al. Cross-entropy vs. squared error training: a theoretical and experimental comparison , 2013, INTERSPEECH.
[19] Alexei A. Efros,et al. Improving Generalization via Scalable Neighborhood Component Analysis , 2018, ECCV.
[20] Sergey Ioffe,et al. Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[21] Nikos Komodakis,et al. Wide Residual Networks , 2016, BMVC.
[22] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[23] Lawrence D. Jackel,et al. Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.
[24] Douglas Kline,et al. Revisiting squared-error and cross-entropy functions for training neural network classifiers , 2005, Neural Computing & Applications.
[25] Jaehoon Lee,et al. Finite Versus Infinite Neural Networks: an Empirical Study , 2020, NeurIPS.
[26] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .
[27] Christopher Potts,et al. Learning Word Vectors for Sentiment Analysis , 2011, ACL.
[28] Barnabás Póczos,et al. Gradient Descent Provably Optimizes Over-parameterized Neural Networks , 2018, ICLR.
[29] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.
[30] Jaehoon Lee,et al. Wide neural networks of any depth evolve as linear models under gradient descent , 2019, NeurIPS.
[31] John Platt,et al. Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .
[32] Kilian Q. Weinberger,et al. On Calibration of Modern Neural Networks , 2017, ICML.
[33] Geoffrey E. Hinton,et al. Regularizing Neural Networks by Penalizing Confident Output Distributions , 2017, ICLR.