This work is a part of ICLR Reproducibility Challenge 2019, we try to reproduce the results in the conference submission PADAM: Closing The Generalization Gap of Adaptive Gradient Methods In Training Deep Neural Networks. Adaptive gradient methods proposed in past demonstrate a degraded generalization performance than the stochastic gradient descent (SGD) with momentum. The authors try to address this problem by designing a new optimization algorithm that bridges the gap between the space of Adaptive Gradient algorithms and SGD with momentum. With this method a new tunable hyperparameter called partially adaptive parameter p is introduced that varies between [0, 0.5]. We build the proposed optimizer and use it to mirror the experiments performed by the authors. We review and comment on the empirical analysis performed by the authors. Finally, we also propose a future direction for further study of Padam. Our code is available at: this https URL
[1]
Frank Hutter,et al.
Fixing Weight Decay Regularization in Adam
,
2017,
ArXiv.
[2]
Jian Sun,et al.
Deep Residual Learning for Image Recognition
,
2015,
2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[3]
Andrew Zisserman,et al.
Very Deep Convolutional Networks for Large-Scale Image Recognition
,
2014,
ICLR.
[4]
Matthew D. Zeiler.
ADADELTA: An Adaptive Learning Rate Method
,
2012,
ArXiv.
[5]
Jimmy Ba,et al.
Adam: A Method for Stochastic Optimization
,
2014,
ICLR.
[6]
Alex Krizhevsky,et al.
Learning Multiple Layers of Features from Tiny Images
,
2009
.