Optimizer Benchmarking Needs to Account for Hyperparameter Tuning
暂无分享,去创建一个
Prabhu Teja Sivaprasad | Franccois Fleuret Idiap Research Institute | Thijs Vogels | Martin Jaggi | Florian Mai | Epfl
[1] S. T. Buckland,et al. An Introduction to the Bootstrap. , 1994 .
[2] H. Robbins. A Stochastic Approximation Method , 1951 .
[3] M. Kenward,et al. An Introduction to the Bootstrap , 2007 .
[4] Susan A. Murphy,et al. Monographs on statistics and applied probability , 1990 .
[5] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..
[6] Christopher Potts,et al. Learning Word Vectors for Sentiment Analysis , 2011, ACL.
[7] Kevin Leyton-Brown,et al. Identifying Key Algorithm Parameters and Instance Features Using Forward Selection , 2013, LION.
[8] Geoffrey E. Hinton,et al. On the importance of initialization and momentum in deep learning , 2013, ICML.
[9] Wei Liu,et al. ParseNet: Looking Wider to See Better , 2015, ArXiv.
[10] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[11] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[12] Nathan Srebro,et al. The Marginal Value of Adaptive Gradient Methods in Machine Learning , 2017, NIPS.
[13] Richard Socher,et al. Improving Generalization Performance by Switching from Adam to SGD , 2017, ArXiv.
[14] Anastasios Kyrillidis,et al. Minimum weight norm models do not always generalize well for over-parameterized problems , 2018 .
[15] Philip Bachman,et al. Deep Reinforcement Learning that Matters , 2017, AAAI.
[16] D. Sculley,et al. Winner's Curse? On Pace, Progress, and Empirical Rigor , 2018, ICLR.
[17] Jinghui Chen,et al. Closing the Generalization Gap of Adaptive Gradient Methods in Training Deep Neural Networks , 2018, IJCAI.
[18] Anastasios Kyrillidis,et al. Minimum norm solutions do not always generalize well for over-parameterized problems , 2018, ArXiv.
[19] Quoc V. Le,et al. Don't Decay the Learning Rate, Increase the Batch Size , 2017, ICLR.
[20] Jan N. van Rijn,et al. Hyperparameter Importance Across Datasets , 2017, KDD.
[21] André Carlos Ponce de Leon Ferreira de Carvalho,et al. An empirical study on hyperparameter tuning of decision trees , 2018, ArXiv.
[22] Mario Lucic,et al. Are GANs Created Equal? A Large-Scale Study , 2017, NeurIPS.
[23] Chris Dyer,et al. On the State of the Art of Evaluation in Neural Language Models , 2017, ICLR.
[24] John C. Duchi,et al. Stochastic (Approximate) Proximal Point Methods: Convergence, Optimality, and Adaptivity , 2018, SIAM J. Optim..
[25] Roy Schwartz,et al. Show Your Work: Improved Reporting of Experimental Results , 2019, EMNLP.
[26] Lars Kotthoff,et al. Automated Machine Learning: Methods, Systems, Challenges , 2019, The Springer Series on Challenges in Machine Learning.
[27] Frank Schneider,et al. DeepOBS: A Deep Learning Optimizer Benchmark Suite , 2019, ICLR.
[28] Bernd Bischl,et al. Tunability: Importance of Hyperparameters of Machine Learning Algorithms , 2018, J. Mach. Learn. Res..
[29] Marius Lindauer,et al. Pitfalls and Best Practices in Algorithm Configuration , 2017, J. Artif. Intell. Res..
[30] Jaehoon Lee,et al. On Empirical Comparisons of Optimizers for Deep Learning , 2019, ArXiv.
[31] John C. Duchi,et al. The importance of better models in stochastic optimization , 2019, Proceedings of the National Academy of Sciences.
[32] Jascha Sohl-Dickstein,et al. Measuring the Effects of Data Parallelism on Neural Network Training , 2018, J. Mach. Learn. Res..
[33] Frank Hutter,et al. Decoupled Weight Decay Regularization , 2017, ICLR.
[34] Andrew McCallum,et al. Energy and Policy Considerations for Deep Learning in NLP , 2019, ACL.
[35] Ben Poole,et al. Using a thousand optimization tasks to learn hyperparameter search strategies , 2020, ArXiv.