DrMAD: Distilling Reverse-Mode Automatic Differentiation for Optimizing Hyperparameters of Deep Neural Networks
暂无分享,去创建一个
Kian Hsiang Low | Tat-Seng Chua | Jiashi Feng | Jie Fu | Hongyin Luo | K. H. Low | Jiashi Feng | Tat-Seng Chua | Jie Fu | Hongyin Luo
[1] Yoshua Bengio,et al. Algorithms for Hyper-Parameter Optimization , 2011, NIPS.
[2] Nando de Freitas,et al. Bayesian Optimization in a Billion Dimensions via Random Embeddings , 2013, J. Artif. Intell. Res..
[3] Joshua B. Tenenbaum,et al. Human-level concept learning through probabilistic program induction , 2015, Science.
[4] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.
[5] Jasper Snoek,et al. Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.
[6] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[7] Andreas Griewank,et al. Evaluating derivatives - principles and techniques of algorithmic differentiation, Second Edition , 2000, Frontiers in applied mathematics.
[8] David D. Cox,et al. Hyperopt: A Python Library for Optimizing the Hyperparameters of Machine Learning Algorithms , 2013, SciPy.
[9] Razvan Pascanu,et al. On the Number of Linear Regions of Deep Neural Networks , 2014, NIPS.
[10] Prabhat,et al. Scalable Bayesian Optimization Using Deep Neural Networks , 2015, ICML.
[11] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.
[12] Razvan Pascanu,et al. Theano: new features and speed improvements , 2012, ArXiv.
[13] Nando de Freitas,et al. Taking the Human Out of the Loop: A Review of Bayesian Optimization , 2016, Proceedings of the IEEE.
[14] Tapani Raiko,et al. Scalable Gradient-Based Tuning of Continuous Regularization Hyperparameters , 2015, ICML.
[15] Nando de Freitas,et al. A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning , 2010, ArXiv.
[16] Ryan P. Adams,et al. Gradient-based Hyperparameter Optimization through Reversible Learning , 2015, ICML.
[17] Tapani Raiko,et al. Deep Learning Made Easier by Linear Transformations in Perceptrons , 2012, AISTATS.
[18] Quoc V. Le,et al. Tiled convolutional neural networks , 2010, NIPS.
[19] Tom Schaul,et al. No more pesky learning rates , 2012, ICML.
[20] Oriol Vinyals,et al. Qualitatively characterizing neural network optimization problems , 2014, ICLR.
[21] Alan Fern,et al. Batch Bayesian Optimization via Simulation Matching , 2010, NIPS.
[22] David D. Cox,et al. A High-Throughput Screening Approach to Discovering Good Forms of Biologically Inspired Visual Representation , 2009, PLoS Comput. Biol..
[23] Chuan-Sheng Foo,et al. Efficient multiple hyperparameter learning for log-linear models , 2007, NIPS.
[24] L. Christophorou. Science , 2018, Emerging Dynamics: Science, Energy, Society and Values.
[25] Yann LeCun,et al. The Loss Surfaces of Multilayer Networks , 2014, AISTATS.
[26] Aaron Q. Li,et al. Parameter Server for Distributed Machine Learning , 2013 .
[27] Yoshua Bengio,et al. Gradient-Based Optimization of Hyperparameters , 2000, Neural Computation.
[28] Justin Domke,et al. Generic Methods for Optimization-Based Modeling , 2012, AISTATS.
[29] Yann LeCun,et al. The Loss Surface of Multilayer Networks , 2014, ArXiv.
[30] Andrea Walther,et al. Automatic differentiation of explicit Runge-Kutta methods for optimal control , 2007, Comput. Optim. Appl..
[31] Tom Schaul,et al. Adaptive learning rates and parallelization for stochastic, sparse, non-smooth gradients , 2013, ICLR.
[32] Barak A. Pearlmutter,et al. Automatic differentiation in machine learning: a survey , 2015, J. Mach. Learn. Res..
[33] Surya Ganguli,et al. Identifying and attacking the saddle point problem in high-dimensional non-convex optimization , 2014, NIPS.
[34] David D. Cox,et al. Making a Science of Model Search: Hyperparameter Optimization in Hundreds of Dimensions for Vision Architectures , 2013, ICML.
[35] Radford M. Neal. MCMC Using Hamiltonian Dynamics , 2011, 1206.1901.
[36] Yoshua Bengio,et al. Joint Training of Deep Boltzmann Machines , 2012, ArXiv.
[37] Michael I. Jordan,et al. Advances in Neural Information Processing Systems 30 , 1995 .
[38] Jasper Snoek,et al. Multi-Task Bayesian Optimization , 2013, NIPS.
[39] M. V. Rossum,et al. In Neural Computation , 2022 .
[40] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..
[41] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..