PSEUDO-LIKELIHOOD ESTIMATION FOR INCOMPLETE DATA

In statistical practice, incomplete measurement sequences are the rule rather than the exception. Fortunately, in a large variety of settings, the stochas- tic mechanism governing the incompleteness can be ignored without hampering inferences about the measurement process. While ignorability only requires the relatively general missing at random assumption for likelihood and Bayesian in- ferences, this result cannot be invoked when non-likelihood methods are used. A direct consequence of this is that a popular non-likelihood-based method, such as generalized estimating equations, needs to be adapted towards a weighted version or doubly-robust version when a missing at random process operates. So far, no such modification has been devised for pseudo-likelihood based strategies. We pro- pose a suite of corrections to the standard form of pseudo-likelihood to ensure its validity under missingness at random. Our corrections follow both single and dou- ble robustness ideas, and is relatively simple to apply. When missingness is in the form of dropout in longitudinal data or incomplete clusters, such a structure can be exploited toward further corrections. The proposed method is applied to data from a clinical trial in onychomycosis and a developmental toxicity study.

[1]  J. Robins,et al.  Adjusting for Nonignorable Drop-Out Using Semiparametric Nonresponse Models , 1999 .

[2]  Geert Molenberghs,et al.  Likelihood Based Frequentist Inference When Data Are Missing at Random , 1998 .

[3]  Joseph G Ibrahim,et al.  Pseudo‐likelihood methods for longitudinal binary data with non‐ignorable missing responses and covariates , 2006, Statistics in medicine.

[4]  J. Robins,et al.  Analysis of semiparametric regression models for repeated outcomes in the presence of missing data , 1995 .

[5]  C. Varin On composite marginal likelihoods , 2008 .

[6]  Roger A. Sugden,et al.  Multiple Imputation for Nonresponse in Surveys , 1988 .

[7]  J. C. van Houwelingen,et al.  A goodness-of-fit test for binary regression models, based on smoothing methods , 1991 .

[8]  S. Lele,et al.  A Composite Likelihood Approach to Binary Spatial Data , 1998 .

[9]  Geert Molenberghs,et al.  Analyzing incomplete longitudinal clinical trial data. , 2004, Biostatistics.

[10]  Susan Budavari,et al.  The Merck index : an encyclopedia of chemicals, drugs, and biologicals , 1983 .

[11]  S. Zeger,et al.  Longitudinal data analysis using generalized linear models , 1986 .

[12]  Developmental toxicity evaluation of dietary di(2-ethylhexyl)phthalate in Fischer 344 rats and CD-1 mice. , 1988 .

[13]  Joseph Kang,et al.  Demystifying Double Robustness: A Comparison of Alternative Strategies for Estimating a Population Mean from Incomplete Data , 2007, 0804.2958.

[14]  James M. Robins,et al.  Unified Methods for Censored Longitudinal Data and Causality , 2003 .

[15]  Geert Molenberghs,et al.  Last Observation Carried Forward: A Crystal Ball? , 2009, Journal of biopharmaceutical statistics.

[16]  G. Molenberghs,et al.  An exponential family model for clustered multivariate binary data , 1999 .

[17]  J. Robins,et al.  Doubly Robust Estimation in Missing Data and Causal Inference Models , 2005, Biometrics.

[18]  A. Winsor Sampling techniques. , 2000, Nursing times.

[19]  S. Zeger,et al.  Multivariate Regression Analyses for Categorical Data , 1992 .

[20]  Geert Molenberghs,et al.  Missing Data in Clinical Studies , 2007 .

[21]  G. Molenberghs,et al.  Linear Mixed Models for Longitudinal Data , 2001 .

[22]  G. Molenberghs,et al.  Pseudolikelihood Modeling of Multivariate Outcomes in Developmental Toxicology , 1999 .

[23]  Harry Joe,et al.  On weighting of bivariate margins in pairwise likelihood , 2009, J. Multivar. Anal..

[24]  P. Diggle,et al.  Analysis of Longitudinal Data. , 1997 .

[25]  Xiao-Li Meng,et al.  Multiple-Imputation Inferences with Uncongenial Sources of Input , 1994 .

[26]  D. Rubin,et al.  Statistical Analysis with Missing Data. , 1989 .

[27]  A. Rotnitzky Inverse probability weighted methods , 2008 .

[28]  James M. Robins,et al.  Adjusting for Nonignorable Drop-Out Using Semiparametric Nonresponse Models: Rejoinder , 1999 .

[29]  J. Dale Global cross-ratio models for bivariate, discrete, ordered responses. , 1986, Biometrics.

[30]  G. Greenstein,et al.  The Merck Index: An Encyclopedia of Chemicals, Drugs, and Biologicals (14th edition) , 2007 .

[31]  Andrea Rotnitzky,et al.  Estimation of regression models for the mean of repeated outcomes under nonignorable nonmonotone nonresponse. , 2007, Biometrika.

[32]  M. Davidian,et al.  Semiparametric Estimation of Treatment Effect in a Pretest-Posttest Study with Missing Data. , 2005, Statistical science : a review journal of the Institute of Mathematical Statistics.

[33]  Geert Molenberghs,et al.  A note on the comparison of pseudo-likelihood and generalized estimating equations for marginally specified odds ratio models with exchangeable association structure , 1998 .

[34]  G. Molenberghs,et al.  Topics in Modelling of Clustered Data , 2002 .

[35]  E. Lesaffre,et al.  A 12–week treatment for dermatophyte toe onychomycosis terbinafine 250mg/day vs. itraconazole 200mg/day—a double‐blind comparative trial , 1996, The British journal of dermatology.

[36]  Carlo Gaetan,et al.  Composite likelihood methods for space-time data , 2006 .

[37]  G. Molenberghs,et al.  Longitudinal data analysis , 2008 .

[38]  C. Kimmel,et al.  Developmental toxicity evaluation of dietary di(2-ethylhexyl)phthalate in Fischer 344 rats and CD-1 mice. , 1988, Fundamental and applied toxicology : official journal of the Society of Toxicology.

[39]  G. Molenberghs,et al.  Models for Discrete Longitudinal Data , 2005 .

[40]  P. Diggle,et al.  Modelling multivariate binary data with alternating logistic regressions , 1993 .

[41]  D. Cox,et al.  A note on pseudolikelihood constructed from marginal densities , 2004 .

[42]  D. Rubin INFERENCE AND MISSING DATA , 1975 .