Training Neural Networks Using Predictor-Corrector Gradient Descent
暂无分享,去创建一个
[1] Alexandre d'Aspremont,et al. Regularized nonlinear acceleration , 2016, Mathematical Programming.
[2] Marc Teboulle,et al. A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..
[3] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[4] Yu Zhang,et al. Prediction-adaptation-correction recurrent neural networks for low-resource language speech recognition , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[5] M. Frisch,et al. Steepest descent reaction path integration using a first-order predictor-corrector method. , 2010, The Journal of chemical physics.
[6] Andrew Y. Ng,et al. Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .
[7] E. Süli,et al. An introduction to numerical analysis , 2003 .
[8] Seunghak Lee,et al. More Effective Distributed ML via a Stale Synchronous Parallel Parameter Server , 2013, NIPS.
[9] Yu Zhang,et al. Speech recognition with prediction-adaptation-correction recurrent neural networks , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[10] Massimiliano Di Luca,et al. Optimal Perceived Timing: Integrating Sensory Information with Dynamically Updated Expectations , 2016, Scientific Reports.
[11] Sebastian Nowozin,et al. Learning Step Size Controllers for Robust Neural Network Training , 2016, AAAI.
[12] Matthew D. Zeiler. ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.
[13] Boris Polyak. Some methods of speeding up the convergence of iteration methods , 1964 .
[14] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[15] Timothy Dozat,et al. Incorporating Nesterov Momentum into Adam , 2016 .
[16] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..
[17] Quoc V. Le,et al. Adding Gradient Noise Improves Learning for Very Deep Networks , 2015, ArXiv.
[18] David J Heeger,et al. Theory of cortical function , 2017, Proceedings of the National Academy of Sciences.
[19] Marcin Andrychowicz,et al. Learning to learn by gradient descent by gradient descent , 2016, NIPS.
[20] Costanzo Manes,et al. An incremental least squares algorithm for large scale linear classification , 2013, Eur. J. Oper. Res..
[21] Aryan Mokhtari,et al. A Class of Prediction-Correction Methods for Time-Varying Convex Optimization , 2015, IEEE Transactions on Signal Processing.