暂无分享,去创建一个
[1] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[2] Yoshua Bengio,et al. Practical Recommendations for Gradient-Based Training of Deep Architectures , 2012, Neural Networks: Tricks of the Trade.
[3] Yingbin Liang,et al. Median-Truncated Gradient Descent: A Robust and Scalable Nonconvex Approach for Signal Estimation , 2019, Applied and Numerical Harmonic Analysis.
[4] Zhiwei Steven Wu,et al. Distributed Training with Heterogeneous Data: Bridging Median- and Mean-Based Algorithms , 2019, NeurIPS.
[5] Razvan Pascanu,et al. Sharp Minima Can Generalize For Deep Nets , 2017, ICML.
[6] Jorge Nocedal,et al. Optimization Methods for Large-Scale Machine Learning , 2016, SIAM Rev..
[7] J. van Leeuwen,et al. Neural Networks: Tricks of the Trade , 2002, Lecture Notes in Computer Science.
[8] Quanquan Gu,et al. Stochastic Variance-Reduced Hamilton Monte Carlo Methods , 2018, ICML.
[9] Jürgen Schmidhuber,et al. Flat Minima , 1997, Neural Computation.
[10] François Chollet,et al. Xception: Deep Learning with Depthwise Separable Convolutions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[11] Leon Wenliang Zhong,et al. Fast Stochastic Alternating Direction Method of Multipliers , 2013, ICML.
[12] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.
[13] Marc'Aurelio Ranzato,et al. Large Scale Distributed Deep Networks , 2012, NIPS.
[14] Tong Zhang,et al. Solving large scale linear prediction problems using stochastic gradient descent algorithms , 2004, ICML.
[15] H. Robbins. A Stochastic Approximation Method , 1951 .
[16] Prateek Jain,et al. On the Insufficiency of Existing Momentum Schemes for Stochastic Optimization , 2018, 2018 Information Theory and Applications Workshop (ITA).
[17] Tong Zhang,et al. Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.
[18] Kannan Ramchandran,et al. Byzantine-Robust Distributed Learning: Towards Optimal Statistical Rates , 2018, ICML.
[19] Warren B. Powell,et al. Adaptive stepsizes for recursive estimation with applications in approximate dynamic programming , 2006, Machine Learning.
[20] Kilian Q. Weinberger,et al. Deep Networks with Stochastic Depth , 2016, ECCV.
[21] Léon Bottou,et al. Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.
[22] Michael I. Jordan,et al. On the Theory of Variance Reduction for Stochastic Gradient Monte Carlo , 2018, ICML.
[23] Atsushi Sato,et al. Layer-Wise Weight Decay for Deep Neural Networks , 2017, PSIVT.
[24] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..
[25] Zebang Shen,et al. Adaptive Variance Reducing for Stochastic Gradient Descent , 2016, IJCAI.
[26] Yuanzhi Li,et al. An Alternative View: When Does SGD Escape Local Minima? , 2018, ICML.
[27] R. Venkatesh Babu,et al. Generalized Dropout , 2016, ArXiv.
[28] Furong Huang,et al. Escaping From Saddle Points - Online Stochastic Gradient for Tensor Decomposition , 2015, COLT.
[29] Mark W. Schmidt,et al. A Stochastic Gradient Method with an Exponential Convergence Rate for Finite Training Sets , 2012, NIPS.
[30] Kaiming He,et al. Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour , 2017, ArXiv.
[31] Fanhua Shang,et al. A Simple Stochastic Variance Reduced Algorithm with Fast Convergence Rates , 2018, ICML.
[32] Zeyuan Allen Zhu,et al. Variance Reduction for Faster Non-Convex Optimization , 2016, ICML.
[33] Shie Mannor,et al. Outlier Robust Online Learning , 2017, ArXiv.
[34] Zhanxing Zhu,et al. The Anisotropic Noise in Stochastic Gradient Descent: Its Behavior of Escaping from Sharp Minima and Regularization Effects , 2018, ICML.
[35] Heng Huang,et al. Asynchronous Mini-Batch Gradient Descent with Variance Reduction for Non-Convex Optimization , 2017, AAAI.
[36] Yanyao Shen,et al. Learning with Bad Training Data via Iterative Trimmed Loss Minimization , 2018, ICML.
[37] Brendan J. Frey,et al. Adaptive dropout for training deep neural networks , 2013, NIPS.
[38] Roland Vollgraf,et al. Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.
[39] Gregory Cohen,et al. EMNIST: Extending MNIST to handwritten letters , 2017, 2017 International Joint Conference on Neural Networks (IJCNN).
[40] Michael I. Jordan,et al. How to Escape Saddle Points Efficiently , 2017, ICML.
[41] Y. Nesterov. A method for unconstrained convex minimization problem with the rate of convergence o(1/k^2) , 1983 .
[42] Byung-Woo Hong,et al. Adaptive Weight Decay for Deep Neural Networks , 2019, IEEE Access.
[43] Dan Alistarh,et al. Byzantine Stochastic Gradient Descent , 2018, NeurIPS.
[44] Bohyung Han,et al. Regularizing Deep Neural Networks by Noise: Its Interpretation and Optimization , 2017, NIPS.
[45] Geoffrey E. Hinton,et al. Learning representations by back-propagating errors , 1986, Nature.
[46] Jiashi Feng,et al. Efficient Stochastic Gradient Hard Thresholding , 2018, NeurIPS.
[47] Ariel D. Procaccia,et al. Variational Dropout and the Local Reparameterization Trick , 2015, NIPS.
[48] Edward K. Blum,et al. Approximation theory and feedforward networks , 1991, Neural Networks.
[49] Shiyu Chang,et al. Zeroth-Order Stochastic Variance Reduction for Nonconvex Optimization , 2018, NeurIPS.
[50] Nando de Freitas,et al. Unbounded Bayesian Optimization via Regularization , 2015, AISTATS.
[51] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.
[52] Quoc V. Le,et al. Don't Decay the Learning Rate, Increase the Batch Size , 2017, ICLR.
[53] Stefano Soatto,et al. Entropy-SGD: biasing gradient descent into wide valleys , 2016, ICLR.
[54] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[55] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..
[56] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .
[57] Lutz Prechelt,et al. Early Stopping - But When? , 2012, Neural Networks: Tricks of the Trade.
[58] Eugene S. Edgington,et al. Randomization Tests , 2011, International Encyclopedia of Statistical Science.
[59] Prabhat,et al. Scalable Bayesian Optimization Using Deep Neural Networks , 2015, ICML.