Accelerating Minibatch Stochastic Gradient Descent Using Typicality Sampling
暂无分享,去创建一个
Li Li | Fei-Yue Wang | Xinyu Peng | Li Li | Feiyue Wang | Xinyu Peng
[1] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[2] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.
[3] Xu Sun,et al. Adaptive Gradient Methods with Dynamic Bound of Learning Rate , 2019, ICLR.
[4] Jason Weston,et al. Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..
[5] Tong Zhang,et al. Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.
[6] Ning Qian,et al. On the momentum term in gradient descent learning algorithms , 1999, Neural Networks.
[7] Shai Shalev-Shwartz,et al. Stochastic dual coordinate ascent methods for regularized loss , 2012, J. Mach. Learn. Res..
[8] Peter Richtárik,et al. Importance Sampling for Minibatches , 2016, J. Mach. Learn. Res..
[9] Yoshua Bengio,et al. Variance Reduction in SGD by Distributed Importance Sampling , 2015, ArXiv.
[10] Tara N. Sainath,et al. FUNDAMENTAL TECHNOLOGIES IN MODERN SPEECH RECOGNITION Digital Object Identifier 10.1109/MSP.2012.2205597 , 2012 .
[11] Stephen P. Boyd,et al. Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.
[12] Robert D. Tortora,et al. Sampling: Design and Analysis , 2000 .
[13] Jason Weston,et al. Curriculum learning , 2009, ICML '09.
[14] Daphne Koller,et al. Self-Paced Learning for Latent Variable Models , 2010, NIPS.
[15] Bo Pang,et al. Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales , 2005, ACL.
[16] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[17] Timothy Dozat,et al. Incorporating Nesterov Momentum into Adam , 2016 .
[18] Kaiming He,et al. Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour , 2017, ArXiv.
[19] Frank Hutter,et al. Online Batch Selection for Faster Training of Neural Networks , 2015, ArXiv.
[20] Bo Chen,et al. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.
[21] Li Fei-Fei,et al. MentorNet: Learning Data-Driven Curriculum for Very Deep Neural Networks on Corrupted Labels , 2017, ICML.
[22] Kilian Q. Weinberger,et al. Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[23] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..
[24] Alexander J. Smola,et al. Efficient mini-batch training for stochastic optimization , 2014, KDD.
[25] Mark W. Schmidt,et al. A Stochastic Gradient Method with an Exponential Convergence Rate for Finite Training Sets , 2012, NIPS.
[26] Heiga Zen,et al. WaveNet: A Generative Model for Raw Audio , 2016, SSW.
[27] Rob Fergus,et al. Visualizing and Understanding Convolutional Networks , 2013, ECCV.
[28] Matthew D. Zeiler. ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.
[29] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[30] Tong Zhang,et al. Accelerating Minibatch Stochastic Gradient Descent using Stratified Sampling , 2014, ArXiv.
[31] H. Robbins. A Stochastic Approximation Method , 1951 .
[32] Sanjiv Kumar,et al. On the Convergence of Adam and Beyond , 2018 .
[33] Yurii Nesterov,et al. Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.
[34] J. Heinonen. Lectures on Lipschitz analysis , 2005 .
[35] Jonathan J. Hull,et al. A Database for Handwritten Text Recognition Research , 1994, IEEE Trans. Pattern Anal. Mach. Intell..
[36] David W. Jacobs,et al. Big Batch SGD: Automated Inference using Adaptive Batch Sizes , 2016, ArXiv.
[37] Chih-Jen Lin,et al. A comparison of methods for multiclass support vector machines , 2002, IEEE Trans. Neural Networks.
[38] Mark W. Schmidt,et al. Erratum: Hybrid Deterministic-Stochastic Methods for Data Fitting , 2013, SIAM J. Sci. Comput..
[39] Tao Qin,et al. Learning What Data to Learn , 2017, ArXiv.