Learning High-Dimensional Generalized Linear Autoregressive Models

Vector autoregressive models characterize a variety of time series in which linear combinations of current and past observations can be used to accurately predict future observations. For instance, each element of an observation vector could correspond to a different node in a network, and the parameters of an autoregressive model would correspond to the impact of the network structure on the time series evolution. Often, these models are used successfully in practice to learn the structure of social, epidemiological, financial, or biological neural networks. However, little is known about statistical guarantees on the estimates of such models in non-Gaussian settings. This paper addresses the inference of the autoregressive parameters and associated network structure within a generalized linear model framework that includes Poisson and Bernoulli autoregressive processes. At the heart of this analysis is a sparsity-regularized maximum likelihood estimator. While sparsity-regularization is well-studied in the statistics and machine learning communities, those analysis methods cannot be applied to autoregressive generalized linear models because of the correlations and potential heteroscedasticity inherent in the observations. Sample complexity bounds are derived using a combination of martingale concentration inequalities and modern empirical process techniques for dependent random variables. These bounds, which are supported by several simulation studies, characterize the impact of various network parameters on the estimator performance.

[1]  Eero P. Simoncelli,et al.  Spatio-temporal correlations and visual signalling in a complete neuronal population , 2008, Nature.

[2]  Katherine A. Heller,et al.  Modelling Reciprocating Relationships with Hawkes Processes , 2012, NIPS.

[3]  Mathew W. McLean,et al.  Forecasting emergency medical service call arrival rates , 2011, 1107.4919.

[4]  Fukang Zhu,et al.  Modeling time series of counts with COM-Poisson INGARCH models , 2012, Math. Comput. Model..

[5]  Kurt Brännäs,et al.  Time series count data regression , 1994 .

[6]  Dag Tjøstheim,et al.  Poisson Autoregression , 2008 .

[7]  Peng Zhao,et al.  On Model Selection Consistency of Lasso , 2006, J. Mach. Learn. Res..

[8]  R. Willett,et al.  Hypergraph-Based Anomaly Detection in Very Large Networks , 2008 .

[9]  Yacine Ait-Sahalia,et al.  Modeling Financial Contagion Using Mutually Exciting Jump Processes , 2010 .

[10]  Xin Jiang,et al.  Minimax Optimal Rates for Poisson Inverse Problems With Physical Constraints , 2014, IEEE Transactions on Information Theory.

[11]  Daryl J. Daley,et al.  An Introduction to the Theory of Point Processes , 2013 .

[12]  Todd P. Coleman,et al.  Using Convex Optimization for Nonparametric Statistical Analysis of Point Processes , 2007, 2007 IEEE International Symposium on Information Theory.

[13]  Shyh-Jier Huang,et al.  Short-term load forecasting via ARMA model identification including non-Gaussian process considerations , 2003 .

[14]  G. Michailidis,et al.  Regularized estimation in sparse high-dimensional time series models , 2013, 1311.4175.

[15]  Christian Gourieroux,et al.  Autoregressive Gamma Processes , 2005 .

[16]  Éva Tardos,et al.  Maximizing the Spread of Influence through a Social Network , 2015, Theory Comput..

[17]  R. Kass,et al.  Multiple neural spike train data analysis: state-of-the-art and future challenges , 2004, Nature Neuroscience.

[18]  Martin J. Wainwright,et al.  A unified framework for high-dimensional analysis of $M$-estimators with decomposable regularizers , 2009, NIPS.

[19]  A. Robert Calderbank,et al.  Performance Bounds for Expander-Based Compressed Sensing in Poisson Noise , 2010, IEEE Transactions on Signal Processing.

[20]  Lasso and probabilistic inequalities for multivariate point processes , 2015, 1208.0570.

[21]  Tina Hviid Rydberg,et al.  A Modelling Framework for the Prices and Times of Trades Made on the New York Stock Exchange , 1999 .

[22]  Dimitris Achlioptas,et al.  On Spectral Learning of Mixtures of Distributions , 2005, COLT.

[23]  S. Zeger A regression model for time series of counts , 1988 .

[24]  E. Bacry,et al.  A generalization error bound for sparse and low-rank multivariate Hawkes processes , 2015 .

[25]  Sujay Sanghavi,et al.  Learning the graph of epidemic cascades , 2012, SIGMETRICS '12.

[26]  Rebecca Willett,et al.  A Data-Dependent Weighted LASSO Under Poisson Noise , 2015, IEEE Transactions on Information Theory.

[27]  R. Rigby,et al.  Generalized Autoregressive Moving Average Models , 2003 .

[28]  V. Chavez-Demoulin,et al.  High-frequency financial data modeling using Hawkes processes , 2012 .

[29]  Konstantinos Fokianos,et al.  Log-linear Poisson autoregression , 2011, J. Multivar. Anal..

[30]  Roummel F. Marcia,et al.  Sequential Anomaly Detection in the Presence of Noise and Limited Feedback , 2009, IEEE Transactions on Information Theory.

[31]  Eric R. Ziegel,et al.  Multivariate Statistical Modelling Based on Generalized Linear Models , 2002, Technometrics.

[32]  Roummel F. Marcia,et al.  Compressed Sensing Performance Bounds Under Poisson Noise , 2009, IEEE Transactions on Signal Processing.

[33]  Roman Borisyuk,et al.  Statistical technique for analysing functional connectivity of multiple spike trains , 2011, Journal of Neuroscience Methods.

[34]  Rebecca Willett,et al.  Hypergraph-Based Anomaly Detection of High-Dimensional Co-Occurrences , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Fukang Zhu A negative binomial integer‐valued GARCH model , 2010 .

[36]  Fukang Zhu Modeling overdispersed or underdispersed count data with generalized Poisson integer-valued GARCH models , 2012 .

[37]  P Johansson,et al.  Speed limitation and motorway casualties: a time series count data regression approach. , 1996, Accident; analysis and prevention.

[38]  D. Vere-Jones,et al.  Some examples of statistical estimation applied to earthquake data , 1982 .

[39]  D. Pollard Convergence of stochastic processes , 1984 .

[40]  P. Bickel,et al.  SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR , 2008, 0801.1095.

[41]  Rob J Hyndman,et al.  Theory & Methods: Non‐Gaussian Conditional Linear AR(1) Models , 2000 .

[42]  Ambuj Tewari,et al.  Sequential complexities and uniform martingale laws of large numbers , 2015 .

[43]  Mingzhou Ding,et al.  Analyzing coherent brain networks with Granger causality , 2011, 2011 Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[44]  P. Bickel,et al.  Large Vector Auto Regressions , 2011, 1106.3915.

[45]  Fang Han,et al.  Transition Matrix Estimation in High Dimensional Time Series , 2013, ICML.

[46]  A. Hawkes Point Spectra of Some Mutually Exciting Point Processes , 1971 .

[47]  Robert D. Nowak,et al.  Multiscale Poisson Intensity and Density Estimation , 2007, IEEE Transactions on Information Theory.

[48]  B. Jørgensen,et al.  A state-space model for multivariate longitudinal count data , 1999 .

[49]  S. Bobkov,et al.  On Modified Logarithmic Sobolev Inequalities for Bernoulli and Poisson Measures , 1998 .

[50]  Herold Dehling,et al.  Empirical Process Techniques for Dependent Data , 2002 .

[51]  Y. Ogata Seismicity Analysis through Point-process Modeling: A Review , 1999 .

[52]  Ming Yuan,et al.  Sparse Recovery in Large Ensembles of Kernel Machines On-Line Learning and Bandits , 2008, COLT.

[53]  P. Reynaud-Bouret,et al.  Exponential Inequalities, with Constants, for U-statistics of Order Two , 2003 .

[54]  S. Geer,et al.  High-dimensional additive modeling , 2008, 0806.4115.

[55]  秀俊 松井,et al.  Statistics for High-Dimensional Data: Methods, Theory and Applications , 2014 .

[56]  Alessandro Ingrosso,et al.  The patient-zero problem with noisy observations , 2014, 1408.0907.

[57]  Andréas Heinen,et al.  Modelling Time Series Count Data: An Autoregressive Conditional Poisson Model , 2003 .

[58]  A. Stomakhin,et al.  Reconstruction of missing data in social networks based on temporal patterns of interactions , 2011 .

[59]  Martin J. Wainwright,et al.  Minimax-Optimal Rates For Sparse Additive Models Over Kernel Classes Via Convex Programming , 2010, J. Mach. Learn. Res..

[60]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[61]  Martin J. Wainwright,et al.  Minimax Rates of Estimation for High-Dimensional Linear Regression Over $\ell_q$ -Balls , 2009, IEEE Transactions on Information Theory.

[62]  Fukang Zhu,et al.  Estimation and testing for a Poisson autoregressive model , 2011 .

[63]  A. Hawkes Spectra of some self-exciting and mutually exciting point processes , 1971 .

[64]  S. Geer HIGH-DIMENSIONAL GENERALIZED LINEAR MODELS AND THE LASSO , 2008, 0804.0703.

[65]  M. Kuperman,et al.  Small world effect in an epidemiological model. , 2000, Physical review letters.

[66]  A. Kock,et al.  Oracle Inequalities for High Dimensional Vector Autoregressions , 2012, 1311.0811.

[67]  Le Song,et al.  Learning Social Infectivity in Sparse Low-rank Networks Using Multi-dimensional Hawkes Processes , 2013, AISTATS.

[68]  S. P. Pederson,et al.  Hidden Markov and Other Models for Discrete-Valued Time Series , 1998 .

[69]  M. Hinne,et al.  Bayesian Inference of Whole-Brain Networks , 2012, 1202.1696.

[70]  Emery N. Brown,et al.  Estimating a State-space Model from Point Process Observations Emery N. Brown , 2022 .