Automating Vector Autoregression on Electronic Patient Diary Data

Finding the best vector autoregression model for any dataset, medical or otherwise, is a process that, to this day, is frequently performed manually in an iterative manner requiring a statistical expertize and time. Very few software solutions for automating this process exist, and they still require statistical expertize to operate. We propose a new application called Autovar, for the automation of finding vector autoregression models for time series data. The approach closely resembles the way in which experts work manually. Our proposal offers improvements over the manual approach by leveraging computing power, e.g., by considering multiple alternatives instead of choosing just one. In this paper, we describe the design and implementation of Autovar, we compare its performance against experts working manually, and we compare its features to those of the most used commercial solution available today. The main contribution of Autovar is to show that vector autoregression on a large scale is feasible. We show that an exhaustive approach for model selection can be relatively safe to use. This study forms an important step toward making adaptive, personalized treatment available and affordable for all branches of healthcare.

[1]  Thomas J. Sargent,et al.  Federal Reserve Bank of Minneapolis Quarterly Review Help for the Regional Economic Forecaster: Vector Autoregression < P . 2) Estimating Vector Autoregressions Using Methods Not Based on Explicit Economic Theories ( P . 8) , 2022 .

[2]  C. Granger Investigating causal relations by econometric models and cross-spectral methods , 1969 .

[3]  Stephen M. Stigler,et al.  Fisher and the 5% Level , 2008 .

[4]  G. Box,et al.  On a measure of lack of fit in time series models , 1978 .

[5]  R. D'Agostino,et al.  A Suggestion for Using Powerful and Informative Tests of Normality , 1990 .

[6]  Peter C. B. Phillips,et al.  Econometric Model Determination , 1996 .

[7]  Jennifer L. Castle Automatic Econometric Model Selection using PcGets. , 2006 .

[8]  P. Phillips Testing for a Unit Root in Time Series Regression , 1988 .

[9]  Jonathan D. Cryer,et al.  Time Series Analysis , 1986 .

[10]  Jacques N. Gordon,et al.  VaR , 2003, Derivatives.

[11]  Bernhard Pfaff,et al.  VAR, SVAR and SVEC Models: Implementation Within R Package vars , 2008 .

[12]  C. Nelson,et al.  Trends and random walks in macroeconmic time series: Some evidence and implications , 1982 .

[13]  Giorgio E. Primiceri Time Varying Structural Vector Autoregressions and Monetary Policy , 2002 .

[14]  H. White A Heteroskedasticity-Consistent Covariance Matrix Estimator and a Direct Test for Heteroskedasticity , 1980 .

[15]  M. Eichler,et al.  A graphical vector autoregressive modelling approach to the analysis of electronic diary data , 2010, BMC medical research methodology.

[16]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[17]  Peter de Jonge,et al.  Revealing Causal Heterogeneity Using Time Series Analysis of Ambulatory Assessments: Application to the Association Between Depression and Physical Activity After Myocardial Infarction , 2012, Psychosomatic medicine.

[18]  J. Rosmalen,et al.  Individual variation in temporal relationships between stress and functional somatic symptoms. , 2014, Journal of psychosomatic research.

[19]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[20]  Hadley Wickham,et al.  ggplot2 - Elegant Graphics for Data Analysis (2nd Edition) , 2017 .

[21]  H. Akaike A new look at the statistical model identification , 1974 .

[22]  C. Granger,et al.  An introduction to bilinear time series models , 1979 .

[23]  David Weller,et al.  Functional Somatic Symptoms and Psychological States: An Electronic Diary Study , 2009, Psychosomatic medicine.

[24]  Francis X. Diebold,et al.  Elements of Forecasting , 1997 .

[25]  P. Young,et al.  Time series analysis, forecasting and control , 1972, IEEE Transactions on Automatic Control.

[26]  Giampiero M. Gallo,et al.  "A flexible Tool for Model Building: the Relevant Transformation of the Inputs Network Approach" (RETINA)". , 2003 .

[27]  C. Pipper,et al.  [''R"--project for statistical computing]. , 2008, Ugeskrift for laeger.

[28]  Patrick Royston,et al.  Comment on sg3.4 and an Improved D'Agostino Test , 1992 .

[29]  M. Hashem Pesaran,et al.  A Recursive Modelling Approach to Predicting UK Stock Returns , 2000 .

[30]  P. D. Owen,et al.  General-to-Specific Modelling Using Pcgets , 2003 .

[31]  David A. Belsley,et al.  Regression Analysis and its Application: A Data-Oriented Approach.@@@Applied Linear Regression.@@@Regression Diagnostics: Identifying Influential Data and Sources of Collinearity , 1981 .

[32]  Denis Cousineau,et al.  Outliers detection and treatment: a review , 2010 .

[33]  Helmut Ltkepohl,et al.  New Introduction to Multiple Time Series Analysis , 2007 .

[34]  Gwilym M. Jenkins,et al.  Time series analysis, forecasting and control , 1971 .

[35]  Anil K. Bera,et al.  Efficient tests for normality, homoscedasticity and serial independence of regression residuals , 1980 .

[36]  T. Lataster,et al.  Mobile assessment in schizophrenia: a data-driven momentary approach. , 2012, Schizophrenia bulletin.