SAMoSSA: Multivariate Singular Spectrum Analysis with Stochastic Autoregressive Noise

The well-established practice of time series analysis involves estimating deterministic, non-stationary trend and seasonality components followed by learning the residual stochastic, stationary components. Recently, it has been shown that one can learn the deterministic non-stationary components accurately using multivariate Singular Spectrum Analysis (mSSA) in the absence of a correlated stationary component; meanwhile, in the absence of deterministic non-stationary components, the Autoregressive (AR) stationary component can also be learnt readily, e.g. via Ordinary Least Squares (OLS). However, a theoretical underpinning of multi-stage learning algorithms involving both deterministic and stationary components has been absent in the literature despite its pervasiveness. We resolve this open question by establishing desirable theoretical guarantees for a natural two-stage algorithm, where mSSA is first applied to estimate the non-stationary components despite the presence of a correlated stationary AR component, which is subsequently learned from the residual time series. We provide a finite-sample forecasting consistency bound for the proposed algorithm, SAMoSSA, which is data-driven and thus requires minimal parameter tuning. To establish theoretical guarantees, we overcome three hurdles: (i) we characterize the spectra of Page matrices of stable AR processes, thus extending the analysis of mSSA; (ii) we extend the analysis of AR process identification in the presence of arbitrary bounded perturbations; (iii) we characterize the out-of-sample or forecasting error, as opposed to solely considering model identification. Through representative empirical studies, we validate the superior performance of SAMoSSA compared to existing baselines. Notably, SAMoSSA's ability to account for AR noise structure yields improvements ranging from 5% to 37% across various benchmark datasets.

[1]  A. Proutière,et al.  Finite-Time Identification of Linear Systems: Fundamental Limits and Optimal Algorithms , 2023, IEEE Transactions on Automatic Control.

[2]  Devavrat Shah,et al.  On Principal Component Regression in a High-Dimensional Error-in-Variables Setting , 2020, ArXiv.

[3]  O. Papaspiliopoulos High-Dimensional Probability: An Introduction with Applications in Data Science , 2020 .

[4]  D. Shah,et al.  On Multivariate Singular Spectrum Analysis and Its Variants , 2020, SIGMETRICS.

[5]  G. A. Young,et al.  High‐dimensional Statistics: A Non‐asymptotic Viewpoint, Martin J.Wainwright, Cambridge University Press, 2019, xvii 552 pages, £57.99, hardback ISBN: 978‐1‐1084‐9802‐9 , 2020, International Statistical Review.

[6]  Alexandre Proutiere,et al.  Finite-time Identification of Stable Linear Systems Optimality of the Least-Squares Estimator , 2020, 2020 59th IEEE Conference on Decision and Control (CDC).

[7]  Cristian R. Rojas,et al.  A Finite-Sample Deviation Bound for Stable Autoregressive Processes , 2019, L4DC.

[8]  Alexandre Proutière,et al.  Sample Complexity Lower Bounds for Linear System Identification , 2019, 2019 IEEE 58th Conference on Decision and Control (CDC).

[9]  Alexander Rakhlin,et al.  Near optimal finite time identification of arbitrary linear dynamical systems , 2018, ICML.

[10]  Anatoly Zhigljavsky,et al.  Singular Spectrum Analysis with R , 2018 .

[11]  Devavrat Shah,et al.  Model Agnostic Time Series Analysis via Matrix Estimation , 2018, Proc. ACM Meas. Anal. Comput. Syst..

[12]  Michael I. Jordan,et al.  Learning Without Mixing: Towards A Sharp Analysis of Linear System Identification , 2018, COLT.

[13]  Hossein Hassani,et al.  MULTIVARIATE SINGULAR SPECTRUM ANALYSIS: A GENERAL VIEW AND NEW VECTOR FORECASTING APPROACH , 2013 .

[14]  Csaba Szepesvári,et al.  Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.

[15]  Roman Vershynin,et al.  Introduction to the non-asymptotic analysis of random matrices , 2010, Compressed Sensing.

[16]  Andrew L. Rukhin,et al.  Analysis of Time Series Structure SSA and Related Techniques , 2002, Technometrics.

[17]  P. Wedin Perturbation bounds in connection with singular value decomposition , 1972 .

[18]  W. Kahan,et al.  The Rotation of Eigenvectors by a Perturbation. III , 1970 .

[19]  D. Shah,et al.  Change Point Detection via Multivariate Singular Spectrum Analysis , 2021, NeurIPS.

[20]  D. Shah,et al.  tspDB: Time Series Predict DB , 2020, NeurIPS.

[21]  D. Donoho,et al.  The Optimal Hard Threshold for Singular Values is 4 / √ 3 , 2013 .

[22]  Skipper Seabold,et al.  Statsmodels: Econometric and Statistical Modeling with Python , 2010, SciPy.

[23]  T. Cipra Statistical Analysis of Time Series , 2010 .

[24]  竹安 数博,et al.  Time series analysis and its applications , 2007 .

[25]  J. Durbin Estimation of Parameters in Time‐Series Regression Models , 1960 .