暂无分享,去创建一个
Lu Sheng | Jie Liu | Junjie Yan | Ming Sun | Wanli Ouyang | Chuming Li | Chen Lin | Junjie Yan | Wanli Ouyang | Jie Liu | Lu Sheng | Chuming Li | Chen Lin | Ming Sun
[1] Richard Socher,et al. Regularizing and Optimizing LSTM Language Models , 2017, ICLR.
[2] Nathan Srebro,et al. The Marginal Value of Adaptive Gradient Methods in Machine Learning , 2017, NIPS.
[3] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..
[4] Guanglu Song,et al. Revisiting the Sibling Head in Object Detector , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[5] Anders Krogh,et al. A Simple Weight Decay Can Improve Generalization , 1991, NIPS.
[6] Xiangyu Zhang,et al. ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design , 2018, ECCV.
[7] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.
[8] Martin A. Riedmiller,et al. RPROP - A Fast Adaptive Learning Algorithm , 1992 .
[9] Mark Sandler,et al. MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[10] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[11] Bin Dong,et al. Nostalgic Adam: Weighing more of the past gradients when designing the adaptive learning rate , 2019, IJCAI.
[12] H. Robbins. A Stochastic Approximation Method , 1951 .
[13] Sanjiv Kumar,et al. On the Convergence of Adam and Beyond , 2018 .
[14] Xu Sun,et al. Adaptive Gradient Methods with Dynamic Bound of Learning Rate , 2019, ICLR.
[15] Matthew D. Zeiler. ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.
[16] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[17] Alex Krizhevsky,et al. Learning Multiple Layers of Features from Tiny Images , 2009 .
[18] Jinghui Chen,et al. Closing the Generalization Gap of Adaptive Gradient Methods in Training Deep Neural Networks , 2018, IJCAI.
[19] Yi Li,et al. Instance-Sensitive Fully Convolutional Networks , 2016, ECCV.
[20] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[21] Timothy Dozat,et al. Incorporating Nesterov Momentum into Adam , 2016 .
[22] Yong Yu,et al. AdaShift: Decorrelation and Convergence of Adaptive Learning Rate Methods , 2018, ICLR.
[23] Geoffrey E. Hinton,et al. On the importance of initialization and momentum in deep learning , 2013, ICML.
[24] Frank Hutter,et al. Decoupled Weight Decay Regularization , 2017, ICLR.
[25] Zhi Zhang,et al. Bag of Tricks for Image Classification with Convolutional Neural Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[26] Matthew J. Streeter,et al. Adaptive Bound Optimization for Online Convex Optimization , 2010, COLT 2010.
[27] Sanjeev Arora,et al. An Exponential Learning Rate Schedule for Deep Learning , 2020, ICLR.
[28] Enhua Wu,et al. Squeeze-and-Excitation Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[29] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.