LALR: Theoretical and Experimental validation of Lipschitz Adaptive Learning Rate in Regression and Neural Networks

We propose a theoretical framework for an adaptive learning rate policy for the Mean Absolute Error loss function and Quantile loss function and evaluate its effectiveness for regression tasks. The framework is based on the theory of Lipschitz continuity, specifically utilizing the relationship between learning rate and Lipschitz constant of the loss function. Based on experimentation, we have found that the adaptive learning rate policy enables up to 20x faster convergence compared to a constant learning rate policy.

[1]  Christoph H. Lampert,et al.  Data-Dependent Stability of Stochastic Gradient Descent , 2017, ICML.

[2]  I. Dimopoulos,et al.  Application of neural networks to modelling nonlinear relationships in ecology , 1996 .

[3]  Youngwook Kee,et al.  Towards Flatter Loss Surface via Nonmonotonic Learning Rate Scheduling , 2018, UAI.

[4]  Krzysztof Sakrejda,et al.  Case Study in Evaluating Time Series Prediction Models Using the Relative Mean Absolute Error , 2016, The American statistician.

[5]  Raúl Rojas,et al.  The Backpropagation Algorithm , 1996 .

[6]  Yoram Singer,et al.  Train faster, generalize better: Stability of stochastic gradient descent , 2015, ICML.

[7]  Yoshua Bengio,et al.  Practical Recommendations for Gradient-Based Training of Deep Architectures , 2012, Neural Networks: Tricks of the Trade.

[8]  Jason Yosinski,et al.  Deep neural networks are easily fooled: High confidence predictions for unrecognizable images , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Skipper Seabold,et al.  Statsmodels: Econometric and Statistical Modeling with Python , 2010, SciPy.

[10]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[11]  Sebastian Ruder,et al.  An overview of gradient descent optimization algorithms , 2016, Vestnik komp'iuternykh i informatsionnykh tekhnologii.

[12]  Alexander J. Smola,et al.  Nonparametric Quantile Estimation , 2006, J. Mach. Learn. Res..

[13]  Cynthia Rudin,et al.  Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead , 2018, Nature Machine Intelligence.