Deletion measures for generalized linear mixed effects models

Generalized linear mixed models (GLMMs) have wide applications in practice. Similar to other data analyses, the identification of influential observations that may be potential outliers is an important step beyond estimation in GLMMs. Since the pioneering work of Cook in 1977, deletion measures have been applied to many statistical models for identifying influential observations. However, as this well-known approach is based on the observed-data likelihood, it is very difficult to apply it to developing diagnostic measures for GLMMs due to the complexity of the observed-data likelihood that involves multidimensional integrals. The objective of this article is to develop diagnostic measures for identifying influential observations. Deletion measures are developed on the basis of the conditional expectation of the complete-data log-likelihood at the E-step of a stochastic approximation Markov chain Monte Carlo algorithm. Making use of by-products of the estimation to compute building blocks of the proposed diagnostic measures and activating appropriate approximations, the proposed methods require little additional computation. The performance of the methods is illustrated by an artificial example, a real example, and some simulation studies.

[1]  R. Cook Assessment of Local Influence , 1986 .

[2]  Xin-Yuan Song,et al.  Local influence analysis of multivariate probit latent variable models , 2006 .

[3]  Eva Cantoni,et al.  A robust approach to longitudinal data analysis , 2004 .

[4]  S. Chatterjee Sensitivity analysis in linear regression , 1988 .

[5]  P. Albert,et al.  Models for longitudinal data: a generalized estimating equation approach. , 1988, Biometrics.

[6]  D. Clayton,et al.  Empirical Bayes estimates of age-standardized relative risks for use in disease mapping. , 1987, Biometrics.

[7]  Xin-Yuan Song,et al.  LOCAL INFLUENCE ANALYSIS OF TWO-LEVEL LATENT VARIABLE MODELS WITH CONTINUOUS AND POLYTOMOUS DATA , 2004 .

[8]  Bin Lu,et al.  Assessing local influence for nonlinear structural equation models with ignorable missing data , 2006, Comput. Stat. Data Anal..

[9]  M. Woodbury,et al.  Empirical Bayes procedures for stabilizing maps of U.S. cancer mortality rates. , 1989, Journal of the American Statistical Association.

[10]  Sik-Yum Lee,et al.  Case-Deletion Diagnostics for Nonlinear Structural Equation Models , 2003, Multivariate behavioral research.

[11]  John Haslett,et al.  Application of ‘delete = replace’ to deletion diagnostics for variance component estimation in the linear mixed model , 2004 .

[12]  Sik-Yum Lee,et al.  Local influence analysis of nonlinear structural equation models , 2004 .

[13]  Anthony Y. C. Kuk,et al.  Pointwise and functional approximations in Monte Carlo maximum likelihood estimation , 1999, Stat. Comput..

[14]  M. Berger,et al.  Detection of Influential Observations in Longitudinal Mixed Effects Regression Models , 2001 .

[15]  Donald Geman,et al.  Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images , 1984 .

[16]  J. Ware,et al.  Random-effects models for serial observations with binary response. , 1984, Biometrics.

[17]  C. Mcgilchrist Estimation in Generalized Mixed Models , 1994 .

[18]  L. Skovgaard NONLINEAR MODELS FOR REPEATED MEASUREMENT DATA. , 1996 .

[19]  D. A. Williams,et al.  Extra‐Binomial Variation in Logistic Linear Models , 1982 .

[20]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[21]  C. Borror Generalized Linear Models and Extensions, Second Edition , 2008 .

[22]  Peter J. Rousseeuw,et al.  Robust regression and outlier detection , 1987 .

[23]  Zhongyi Zhu,et al.  Estimation in a semiparametric model for longitudinal data with unspecified dependence structure , 2002 .

[24]  N. Breslow Extra‐Poisson Variation in Log‐Linear Models , 1984 .

[25]  J. Booth,et al.  Maximizing generalized linear mixed model likelihoods with an automated Monte Carlo EM algorithm , 1999 .

[26]  R. Cook Detection of influential observation in linear regression , 2000 .

[27]  Liang Xu,et al.  Influence analyses of nonlinear mixed-effects models , 2004, Comput. Stat. Data Anal..

[28]  C. McCulloch Maximum Likelihood Algorithms for Generalized Linear Mixed Models , 1997 .

[29]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[30]  M. Tanner Tools for statistical inference: methods for the exploration of posterior distributions and likeliho , 1994 .

[31]  A. Hossain,et al.  A comparative study on detection of influential observations in linear regression , 1991 .

[32]  Anthony C. Davison,et al.  Regression model diagnostics , 1992 .

[33]  S. K. Van Den Eeden,et al.  A conditional analysis for two-treatment multiple-period crossover designs with binomial or poisson outcomes and subjects who drop out. , 1993, Statistics in medicine.

[34]  Wing K. Fung,et al.  Influence diagnostics and outlier tests for semiparametric mixed models , 2002 .

[35]  Sanford Weisberg,et al.  Directions in Robust Statistics and Diagnostics , 1991 .

[36]  Edward W. Frees,et al.  Influence Diagnostics for Linear Longitudinal Models , 1997 .

[37]  S. Weisberg,et al.  Residuals and Influence in Regression , 1982 .

[38]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39]  A. .. Lawrance Local and deletion influence , 1990 .

[40]  Anthony C. Atkinson,et al.  The stalactite plot for the detection of multivariate outliers , 1993 .

[41]  Bo-Cheng Wei,et al.  Case-deletion measures for models with incomplete data , 2001 .

[42]  M. Berger,et al.  Local Influence to Detect Influential Data Structures for Generalized Linear Mixed Models , 2001, Biometrics.

[43]  F. Kong,et al.  A stochastic approximation algorithm with Markov chain Monte-carlo method for incomplete data estimation problems. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[44]  Sik-Yum Lee,et al.  Case-Deletion Diagnostics for Factor Analysis Models With Continuous and Ordinal Categorical Data , 2003 .

[45]  A. Hadi Identifying Multiple Outliers in Multivariate Data , 1992 .

[46]  Liming Xiang,et al.  Influence diagnostics for generalized linear mixed models: applications to clustered data , 2002 .

[47]  Sik-Yum Lee,et al.  Analysis of generalized linear mixed models via a stochastic approximation algorithm with Markov chain Monte-Carlo method , 2002, Stat. Comput..

[48]  Sik-Yum Lee,et al.  Local influence for generalized linear mixed models , 2003 .

[49]  P. McCullagh,et al.  Generalized Linear Models , 1992 .

[50]  G. C. Wei,et al.  A Monte Carlo Implementation of the EM Algorithm and the Poor Man's Data Augmentation Algorithms , 1990 .