暂无分享,去创建一个
Erik Meijer | Kartik Chandra | Johann George | Samantha Andow | Emilio Arroyo-Fang | Irene Dea | Melissa Grueter | Basil Hosmer | Steffi Stumpos | Alanna Tempest | Shannon Yang | E. Meijer | Johann George | Alanna Tempest | K. Chandra | Samantha Andow | Emilio Arroyo-Fang | Irene Dea | Melissa Grueter | Basil Hosmer | Steffi Stumpos | Shannon Yang
[1] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..
[2] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.
[3] Sebastian Ruder,et al. An overview of gradient descent optimization algorithms , 2016, Vestnik komp'iuternykh i informatsionnykh tekhnologii.
[4] Fabian Pedregosa,et al. Hyperparameter optimization with approximate gradient , 2016, ICML.
[5] Yoshua Bengio,et al. Gradient-Based Optimization of Hyperparameters , 2000, Neural Computation.
[6] Mark W. Schmidt,et al. Online Learning Rate Adaptation with Hypergradient Descent , 2017, ICLR.
[7] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.
[8] Simon Haykin,et al. GradientBased Learning Applied to Document Recognition , 2001 .
[9] Luca Antiga,et al. Automatic differentiation in PyTorch , 2017 .
[10] Andreas Griewank,et al. Who Invented the Reverse Mode of Differentiation , 2012 .
[11] Thibault Langlois,et al. Parameter adaptation in stochastic optimization , 1999 .
[12] Frank Hutter,et al. Hyperparameter Optimization , 2019, Automated Machine Learning.
[13] Tapani Raiko,et al. Scalable Gradient-Based Tuning of Continuous Regularization Hyperparameters , 2015, ICML.
[14] Matthew D. Zeiler. ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.
[15] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.