The Design and Implementation of Vector Autoregressive Model and Structural Vector Autoregressive Model Based on Spark

VAR (Vector Auto-regressive) model is a kind of commonly used econometric-model. It is used to estimate the dynamic relationship of the endogenous variables without any prior constraints. Since VAR is one of the most easily operated models to deal with the analysis and prediction of multiple related economic indicators, more and more attention has been paid by economists in two decades. However, with the increasing of data size, the individual computer has encountered its processing bottleneck. Meanwhile, the advantages of the distributed computing cluster have begun to show obvious strength, such as Hadoop, Spark, and so on. Due to the lack of VAR related model on Spark, MLlib, we developed approaches of VAR and SVAR (Structural Vector Auto-regression) model in Spark and Hadoop cluster. Meanwhile, SGD (Stochastic Gradient Descent) algorithm has been applied after the data processing. To verify the approaches, different sizes of data are used for model testing in different platform, including R and Spark cluster. According to the comparison of the response time of different data size in both platform, the experiment results have shown that the developed methods are simple and efficient in big data environment.

[1]  Helmut Ltkepohl,et al.  New Introduction to Multiple Time Series Analysis , 2007 .

[2]  Léon Bottou,et al.  The Tradeoffs of Large Scale Learning , 2007, NIPS.

[3]  Léon Bottou,et al.  On-line learning and stochastic approximations , 1999 .

[4]  H. Krolzig Markov-Switching Vector Autoregressions: Modelling, Statistical Inference, and Application to Business Cycle Analysis , 1997 .

[5]  Rong Zheng,et al.  Asynchronous stochastic gradient descent for DNN training , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[6]  P. Glynn LIKELIHOOD RATIO GRADIENT ESTIMATION : AN OVERVIEW by , 2022 .

[7]  Chanwit Kaewkasi,et al.  A study of big data processing constraints on a low-power Hadoop cluster , 2014, 2014 International Computer Science and Engineering Conference (ICSEC).

[8]  Alan Weiss,et al.  Sensitivity analysis via likelihood ratios , 1986, WSC '86.

[9]  George Saon,et al.  A nonmonotone learning rate strategy for SGD training of deep neural networks , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[10]  Chong Wang,et al.  Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin , 2015, ICML.

[11]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[12]  P. Phillips,et al.  Vector autoregression and causality: a theoretical overview and simulation study , 1994 .

[13]  Walter Enders,et al.  The Effectiveness of Antiterrorism Policies: A Vector-Autoregression-Intervention Analysis , 1993, American Political Science Review.

[14]  Carlo Giannini,et al.  Topics in structural VAR econometrics , 1992 .

[15]  Elisabeta R. Rosca STATIONARY AND NON-STATIONARY TIME SERIES , 2010 .

[16]  L. Eon Bottou Online Learning and Stochastic Approximations , 1998 .

[17]  Scott Shenker,et al.  Fast and Interactive Analytics over Hadoop Data with Spark , 2012, login Usenix Mag..

[18]  Martin Odersky,et al.  Programming in Scala , 2008 .

[19]  Satoshi Matsuoka,et al.  Predicting statistics of asynchronous SGD parameters for a large-scale distributed deep learning system on GPU supercomputers , 2016, 2016 IEEE International Conference on Big Data (Big Data).

[20]  Peter W. Glynn,et al.  Likelihood Ratio Sensitivity Analysis for Markovian Models of Highly Dependable Systems , 1994, Oper. Res..

[21]  Peter W. Glynn,et al.  Likelilood ratio gradient estimation: an overview , 1987, WSC '87.

[22]  Alex Krizhevsky,et al.  One weird trick for parallelizing convolutional neural networks , 2014, ArXiv.

[23]  Helmut Lütkepohl,et al.  Structural Vector Autoregressive Modeling and Impulse Responses , 2004 .

[24]  Martin Odersky,et al.  An Overview of the Scala Programming Language , 2004 .

[25]  Lei Gu,et al.  Memory or Time: Performance Evaluation for Iterative Operation on Hadoop and Spark , 2013, 2013 IEEE 10th International Conference on High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing.

[26]  Joseph K. Bradley,et al.  Spark SQL: Relational Data Processing in Spark , 2015, SIGMOD Conference.

[27]  M. Korenberg Identifying nonlinear difference equation and functional expansion representations: The fast orthogonal algorithm , 2006, Annals of Biomedical Engineering.

[28]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[29]  Helmut Lütkepohl Structural Vector Autoregressive Analysis for Cointegrated Variables , 2006 .