Likelihood-based inference for antedependence (Markov) models for categorical longitudinal data

Antedependence (AD) of order p, also known as the Markov property of order p, is a property of index-ordered random variables in which each variable, given at least p immediately preceding variables, is independent of all further preceding variables. Zimmerman and Nunez-Anton (2010) present statistical methodology for fitting and performing inference for AD models for continuous (primarily normal) longitudinal data. But analogous AD-model methodology for categorical longitudinal data has not yet been well developed. In this thesis, we derive maximum likelihood estimators of transition probabilities under antedependence of any order, and we use these estimators to develop likelihood-based methods for determining the order of antedependence of categorical longitudinal data. Specifically, we develop a penalized likelihood method for determining variable-order antedependence structure, and we derive the likelihood ratio test, score test, Wald test and an adaptation of Fisher’s exact test for p-order antedependence against the unstructured (saturated) multinomial model. Simulation studies show that the score (Pearson’s Chi-square) test performs better than all the other methods for complete and monotone missing data, while the likelihood ratio test is applicable for data with arbitrary missing pattern. But since the likelihood ratio test is oversensitive under the null hypothesis, we modify it by equating the expectation of the test statistic to its degrees of freedom so that it has actual size closer to nominal size. Additionally, we modify the likelihood ratio tests for use in testing for p-order antedependence against q-order antedependence, where q > p, and for testing nested variable-order antedependence models. We extend the methods to deal with data having a monotone or arbitrary missing pattern. For antedependence models of constant order

[1]  D. Cox,et al.  Analysis of Binary Data (2nd ed.). , 1990 .

[2]  T. W. Anderson,et al.  Statistical Inference about Markov Chains , 1957 .

[3]  Bonnie K. Ray,et al.  Regression Models for Time Series Analysis , 2003, Technometrics.

[4]  Eric R. Ziegel,et al.  Analysis of Binary Data (2nd ed.) , 1991 .

[5]  A. Raftery,et al.  The Mixture Transition Distribution Model for High-Order Markov Chains and Non-Gaussian Time Series , 2002 .

[6]  Joseph B. Lang,et al.  Multinomial-Poisson homogeneous models for contingency tables , 2003 .

[7]  David R. Cox The analysis of binary data , 1970 .

[8]  Meir Feder,et al.  A universal finite memory source , 1995, IEEE Trans. Inf. Theory.

[9]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[10]  Scott L. Zeger,et al.  Lorelogram: A Regression Approach to Exploring Dependence in Longitudinal Categorical Responses , 1998 .

[11]  D. Zimmerman,et al.  Antedependence Models for Longitudinal Data , 2009 .

[12]  Alan Agresti,et al.  Categorical Data Analysis , 2003 .

[13]  Michael J Daniels,et al.  A class of markov models for longitudinal ordinal data. , 2007, Biometrics.

[14]  S. Arnold,et al.  Variable order ante-dependence models , 1994 .

[15]  P. Bühlmann,et al.  Variable Length Markov Chains: Methodology, Computing, and Software , 2004 .

[16]  Patrick J Heagerty,et al.  Marginalized Transition Models and Likelihood Inference for Longitudinal Categorical Data , 2002, Biometrics.

[17]  S. Zeger,et al.  Markov regression models for time series: a quasi-likelihood approach. , 1988, Biometrics.

[18]  P. Diggle Analysis of Longitudinal Data , 1995 .

[19]  David Williams,et al.  Improved likelihood ratio tests for complete contingency tables , 1976 .

[20]  David E. Booth,et al.  Analysis of Incomplete Multivariate Data , 2000, Technometrics.

[21]  Farid Kianifard,et al.  Models for Repeated Measurements , 2001, Technometrics.

[22]  A. Azzalini Logistic regression for autocorrelated data with application to repeated measures , 1994 .

[23]  Scott L. Zeger,et al.  Marginalized Multilevel Models and Likelihood Inference , 2000 .

[24]  James J. Heckman,et al.  A Beta-logistic Model for the Analysis of Sequential Labor Force Participation by Married Women , 1975, Journal of Political Economy.

[25]  David R. Anderson,et al.  Model Selection and Multimodel Inference , 2003 .

[26]  Benjamin Kedem,et al.  Regression models for time series analysis , 2002 .

[27]  K. Gabriel,et al.  Ante-dependence Analysis of an Ordered Set of Variables , 1962 .

[28]  G. Molenberghs,et al.  Models for Discrete Longitudinal Data , 2005 .

[29]  Tapabrata Maiti,et al.  Analysis of Longitudinal Data (2nd ed.) (Book) , 2004 .