Statistical models for count time series with excess zeros

Time series data involving counts are frequently encountered in many biomedical and public health applications. For example, in disease surveillance, the occurrence of rare infections over time is often monitored by public health officials, and the time series data collected can be used for the purpose of monitoring changes in disease activity. For rare diseases with low infection rates, the observed counts typically contain a high frequency of zeros (zero-inflated), but the counts can also be very large during an outbreak period. Failure to account for zero-inflation in the data may result in misleading inference and the detection of spurious associations. In this thesis, we develop two classes of statistical models for zero-inflated time series. The first part of the thesis introduces a class of observation-driven models in a partial likelihood framework. The expectation-maximization (EM) algorithm is applied to obtain the maximum partial likelihood estimator (MPLE). We establish the asymptotic theory of the MPLE under certain regularity conditions. The performances of different partial-likelihood based model selection criteria are compared under model misspecification. In the second part of the thesis, we introduce a class of parameter-driven models in a state-space framework. To estimate the model parameters, we devise a Monte Carlo EM algorithm, where particle filtering and particle smoothing methods are employed to approximate the high-dimensional integrals in the E-step of the algorithm. Upon convergence, Louis' formula is used to find the observed information matrix. The proposed models are illustrated with simulated data and an application based on public health surveillance for syphilis, a sexually transmitted disease (STD) that remains a major public health challenge in the United States. An R package, called ZIM (Zero-Inflated Models), has been developed to fit both observation-driven models and parameter-driven models.

[1]  L. Ryan,et al.  Generalized poisson models arising from Markov processes , 1998 .

[2]  H. White Maximum Likelihood Estimation of Misspecified Models , 1982 .

[3]  J. Lawless,et al.  Tests for Detecting Overdispersion in Poisson Regression Models , 1989 .

[4]  J. Pemberton,et al.  Time Series Analysis with Applications in R, Second edition , 2011 .

[5]  David R. Cox,et al.  Time Series Analysis , 2012 .

[6]  D.,et al.  Regression Models and Life-Tables , 2022 .

[7]  D. Hall Zero‐Inflated Poisson and Binomial Regression with Random Effects: A Case Study , 2000, Biometrics.

[8]  Bonnie K. Ray,et al.  Regression Models for Time Series Analysis , 2003, Technometrics.

[9]  W. Pan Akaike's Information Criterion in Generalized Estimating Equations , 2001, Biometrics.

[10]  Andy H. Lee,et al.  Modeling zero-inflated count series with application to occupational health , 2004, Comput. Methods Programs Biomed..

[11]  S. Zeger A regression model for time series of counts , 1988 .

[12]  B. Leroux,et al.  Statistical models for autocorrelated count data , 2006, Statistics in medicine.

[13]  Diane Lambert,et al.  Zero-inflacted Poisson regression, with an application to defects in manufacturing , 1992 .

[14]  Peter Colman,et al.  Analysis of Longitudinal Data(second edition) Diggle P, Heagarty P, Liang K-Y, Zeger S(2002)ISBN 0198524846; 396 pages;£40.00,$85.00 Oxford University Press; , 2004 .

[15]  Peiming Wang,et al.  Markov zero-inflated Poisson regression models for a time series of counts with excess zeros , 2001 .

[16]  Andy H. Lee,et al.  Analysis of zero-inflated Poisson data incorporating extent of exposure , 2001 .

[17]  Siem Jan Koopman,et al.  Time Series Analysis of Non-Gaussian Observations Based on State Space Models from Both Classical and Bayesian Perspectives , 1999 .

[18]  P. Hall,et al.  Martingale Limit Theory and Its Application , 1980 .

[19]  Jonathan D. Cryer,et al.  Time Series Analysis , 1986 .

[20]  H. Akaike A new look at the statistical model identification , 1974 .

[21]  R. Shumway,et al.  AN APPROACH TO TIME SERIES SMOOTHING AND FORECASTING USING THE EM ALGORITHM , 1982 .

[22]  Huaiyu Zhu On Information and Sufficiency , 1997 .

[23]  H. Kaufmann,et al.  Regression Models for Nonstationary Categorical Time Series: Asymptotic Estimation Theory , 1987 .

[24]  A. Doucet,et al.  Monte Carlo Smoothing for Nonlinear Time Series , 2004, Journal of the American Statistical Association.

[25]  J. Cavanaugh,et al.  Markov regression models for count time series with excess zeros: A partial likelihood approach , 2013 .

[26]  K. Liang,et al.  Asymptotic Properties of Maximum Likelihood Estimators and Likelihood Ratio Tests under Nonstandard Conditions , 1987 .

[27]  Kenneth S. Berenhaut,et al.  Score tests for heterogeneity and overdispersion in zero‐inflated Poisson and binomial regression models , 2002 .

[28]  Daniel B. Hall,et al.  Marginal models for zero inflated clustered data , 2004 .

[29]  M. West,et al.  Dynamic Generalized Linear Models and Bayesian Forecasting , 1985 .

[30]  K. Chan,et al.  Monte Carlo EM Estimation for Time Series Models Involving Counts , 1995 .

[31]  A. Zeileis,et al.  Regression Models for Count Data in R , 2008 .

[32]  Man-Suk Oh,et al.  Bayesian analysis of time series Poisson data , 2001 .

[33]  Andrew Harvey,et al.  Forecasting, Structural Time Series Models and the Kalman Filter , 1990 .

[34]  D. Stoffer,et al.  Fitting Stochastic Volatility Models in the Presence of Irregular Sampling via Particle Methods and the EM Algorithm , 2008 .

[35]  Konstantinos Fokianos,et al.  Log-linear Poisson autoregression , 2011, J. Multivar. Anal..

[36]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[37]  Robert H. Shumway,et al.  On computing the expected Fisher information matrix for state-space model parameters , 1996 .

[38]  P. Young,et al.  Time series analysis, forecasting and control , 1972, IEEE Transactions on Automatic Control.

[39]  Benjamin Kedem,et al.  Prediction and Classification of Non-stationary Categorical Time Series , 1998 .

[40]  Nicholas G. Polson,et al.  A Monte Carlo Approach to Nonnormal and Nonlinear State-Space Modeling , 1992 .

[41]  J. Cavanaugh Unifying the derivations for the Akaike and corrected Akaike information criteria , 1997 .

[42]  Victor H. Lachos,et al.  On estimation and influence diagnostics for zero-inflated negative binomial regression models , 2011, Comput. Stat. Data Anal..

[43]  Andy H. Lee,et al.  Multi-level zero-inflated Poisson regression modelling of correlated count data with excess zeros , 2006, Statistical methods in medical research.

[44]  Fukang Zhu Zero-inflated Poisson and negative binomial integer-valued GARCH models , 2012 .

[45]  Benjamin Kedem,et al.  Partial Likelihood Inference For Time Series Following Generalized Linear Models , 2004 .

[46]  Beatrijs Moerkerke,et al.  The analysis of zero-inflated count data: beyond zero-inflated Poisson regression. , 2012, The British journal of mathematical and statistical psychology.

[47]  Richard A. Davis,et al.  A negative binomial model for time series of counts , 2009 .

[48]  Kung-Sik Chan,et al.  Introducing COZIGAM: An R Package for Unconstrained and Constrained Zero-Inflated Generalized Additive Model Analysis , 2010 .

[49]  J. Mullahy Specification and testing of some modified count data models , 1986 .

[50]  J van den Broek,et al.  A score test for zero inflation in a Poisson distribution. , 1995, Biometrics.

[51]  H. White Consequences and Detection of Misspecified Nonlinear Regression Models , 1981 .

[52]  Dag Tjøstheim,et al.  Poisson Autoregression , 2008 .

[53]  W. Dunsmuir,et al.  Observation-driven models for Poisson counts , 2003 .

[54]  S. Zeger,et al.  Markov regression models for time series: a quasi-likelihood approach. , 1988, Biometrics.