Learning Temporal Dependence from Time-Series Data with Latent Variables

We consider the setting where a collection of time series, modeled as random processes, evolve in a causal manner, and one is interested in learning the graph governing the relationships of these processes. A special case of wide interest and applicability is the setting where the noise is Gaussian and relationships are Markov and linear. We study this setting with two additional features: firstly, each random process has a hidden (latent) state, which we use to model the internal memory possessed by the variables (similar to hidden Markov models). Secondly, each variable can depend on its latent memory state through a random lag (rather than a fixed lag), thus modeling memory recall with differing lags at distinct times. Under this setting, we develop an estimator and prove that under a genericity assumption, the parameters of the model can be learned consistently. We also propose a practical adaption of this estimator, which demonstrates significant performance gains in both synthetic and real-world datasets.

[1]  Martin J. Wainwright,et al.  Sharp Thresholds for High-Dimensional and Noisy Sparsity Recovery Using $\ell _{1}$ -Constrained Quadratic Programming (Lasso) , 2009, IEEE Transactions on Information Theory.

[2]  Patrick Danaher,et al.  The joint graphical lasso for inverse covariance estimation across multiple classes , 2011, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[3]  C. Granger Investigating causal relations by econometric models and cross-spectral methods , 1969 .

[4]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[5]  S. Frick,et al.  Compressed Sensing , 2014, Computer Vision, A Reference Guide.

[6]  R. Tibshirani,et al.  Sparse inverse covariance estimation with the graphical lasso. , 2008, Biostatistics.

[7]  C Loehlin John,et al.  Latent variable models: an introduction to factor, path, and structural analysis , 1986 .

[8]  Bernhard Schölkopf,et al.  Causal Inference by Identification of Vector Autoregressive Processes with Hidden Components , 2015, ICML.

[9]  R. Redner,et al.  Mixture densities, maximum likelihood, and the EM algorithm , 1984 .

[10]  M. Yuan,et al.  Model selection and estimation in the Gaussian graphical model , 2007 .

[11]  Bin Yu,et al.  High-dimensional covariance estimation by minimizing ℓ1-penalized log-determinant divergence , 2008, 0811.3628.

[12]  Ali Jalali,et al.  Learning the Dependence Graph of Time Series with Latent Factors , 2011, ICML.

[13]  Kaare Brandt Petersen,et al.  The Matrix Cookbook , 2006 .

[14]  Zoubin Ghahramani,et al.  An Introduction to Hidden Markov Models and Bayesian Networks , 2001, Int. J. Pattern Recognit. Artif. Intell..

[15]  Neil D. Lawrence,et al.  Learning and Inference in Computational Systems Biology , 2010, Computational molecular biology.

[16]  Yan Liu,et al.  Temporal causal modeling with graphical granger methods , 2007, KDD '07.

[17]  A. Seth,et al.  Multivariate Granger causality and generalized variance. , 2010, Physical review. E, Statistical, nonlinear, and soft matter physics.

[18]  Zoubin Ghahramani,et al.  Modeling genetic regulatory networks using gene expression profiling and state space models , 2005 .

[19]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[20]  S. Bressler,et al.  Beta oscillations in a large-scale sensorimotor cortical network: directional influences revealed by Granger causality. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[21]  Tom Burr,et al.  Causation, Prediction, and Search , 2003, Technometrics.

[22]  Visa Koivunen,et al.  Identifiability, separability, and uniqueness of linear ICA models , 2004, IEEE Signal Processing Letters.

[23]  Craig Hiemstra,et al.  Testing for Linear and Nonlinear Granger Causality in the Stock Price-Volume Relation , 1994 .

[24]  Daniele Marinazzo,et al.  Kernel-Granger causality and the analysis of dynamical networks. , 2008, Physical review. E, Statistical, nonlinear, and soft matter physics.

[25]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[26]  Yoshua Bengio,et al.  Input-output HMMs for sequence processing , 1996, IEEE Trans. Neural Networks.

[27]  Alexandre d'Aspremont,et al.  Model Selection Through Sparse Max Likelihood Estimation Model Selection Through Sparse Maximum Likelihood Estimation for Multivariate Gaussian or Binary Data , 2022 .

[28]  Zoubin Ghahramani,et al.  A Bayesian approach to reconstructing genetic regulatory networks with hidden factors , 2005, Bioinform..

[29]  Yan Liu,et al.  Spatial-temporal causal modeling for climate change attribution , 2009, KDD.

[30]  Pablo A. Parrilo,et al.  Latent variable graphical model selection via convex optimization , 2010, 2010 48th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[31]  N. Meinshausen,et al.  High-dimensional graphs and variable selection with the Lasso , 2006, math/0608017.

[32]  Peter C. Young,et al.  Recursive Estimation and Time-Series Analysis: An Introduction , 1984 .