Optimizing Convergence for Iterative Learning of ARIMA for Stationary Time Series

Forecasting of time series in continuous systems becomes an increasingly relevant task due to recent developments in IoT and 5G. The popular forecasting model ARIMA is applied to a large variety of applications for decades. An online variant of ARIMA applies the Online Newton Step in order to learn the underlying process of the time series. This optimization method has pitfalls concerning the computational complexity and convergence. Thus, this work focuses on the computational less expensive Online Gradient Descent optimization method, which became popular for learning of neural networks in recent years. For the iterative training of such models, we propose a new approach combining different Online Gradient Descent learners (such as Adam, AMSGrad, Adagrad, Nesterov) to achieve fast convergence. The evaluation on synthetic data and experimental datasets show that the proposed approach outperforms the existing methods resulting in an overall lower prediction error.

[1]  Deanna Needell,et al.  Stochastic gradient descent, weighted sampling, and the randomized Kaczmarz algorithm , 2013, Mathematical Programming.

[2]  Sanjiv Kumar,et al.  On the Convergence of Adam and Beyond , 2018 .

[3]  W. Marsden I and J , 2012 .

[4]  John Moody,et al.  Learning rate schedules for faster stochastic gradient search , 1992, Neural Networks for Signal Processing II Proceedings of the 1992 IEEE Workshop.

[5]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[6]  Tom Schaul,et al.  No more pesky learning rates , 2012, ICML.

[7]  Odej Kao,et al.  Unsupervised Anomaly Event Detection for Cloud Monitoring Using Online Arima , 2018, 2018 IEEE/ACM International Conference on Utility and Cloud Computing Companion (UCC Companion).

[8]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[9]  Mingyi Hong,et al.  On the Convergence of A Class of Adam-Type Algorithms for Non-Convex Optimization , 2018, ICLR.

[10]  John Langford,et al.  Online Importance Weight Aware Updates , 2010, UAI.

[11]  Martin Zinkevich,et al.  Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[12]  Y. Nesterov A method for unconstrained convex minimization problem with the rate of convergence o(1/k^2) , 1983 .

[13]  Richard Socher,et al.  Improving Generalization Performance by Switching from Adam to SGD , 2017, ArXiv.

[14]  Sebastian Ruder,et al.  An overview of gradient descent optimization algorithms , 2016, Vestnik komp'iuternykh i informatsionnykh tekhnologii.

[15]  Parag Sen,et al.  Application of ARIMA for forecasting energy consumption and GHG emission: A case study of an Indian pig iron manufacturing organization , 2016 .

[16]  Mathieu Ravaut,et al.  Gradient descent revisited via an adaptive online learning rate , 2018 .

[17]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[18]  Xu Sun,et al.  Adaptive Gradient Methods with Dynamic Bound of Learning Rate , 2019, ICLR.

[19]  Ning Qian,et al.  On the momentum term in gradient descent learning algorithms , 1999, Neural Networks.

[20]  Steven C. H. Hoi,et al.  Online ARIMA Algorithms for Time Series Prediction , 2016, AAAI.

[21]  P. Young,et al.  Time series analysis, forecasting and control , 1972, IEEE Transactions on Automatic Control.

[22]  Qin Yu,et al.  An Improved ARIMA-Based Traffic Anomaly Detection Algorithm for Wireless Sensor Networks , 2016, Int. J. Distributed Sens. Networks.

[23]  S. Shalev-Shwartz,et al.  Stochastic Gradient Descent , 2014 .

[24]  Phuong T. Tran,et al.  On the Convergence Proof of AMSGrad and a New Version , 2019, IEEE Access.

[25]  Raymond Y. C. Tse An application of the ARIMA model to real‐estate prices in Hong Kong , 1997 .

[26]  Thomas Bäck,et al.  Evolutionary algorithms in theory and practice - evolution strategies, evolutionary programming, genetic algorithms , 1996 .