Efficient Algorithms for Robust Estimation in Linear Mixed-Effects Models Using the Multivariate t Distribution

Linear mixed-effects models are frequently used to analyze repeated measures data, because they model flexibly the within-subject correlation often present in this type of data. The most popular linear mixed-effects model for a continuous response assumes normal distributions for the random effects and the within-subject errors, making it sensitive to outliers. Such outliers are more problematic for mixed-effects models than for fixed-effects models, because they may occur in the random effects, in the within-subject errors, or in both, making them harder to be detected in practice. Motivated by a real dataset from an orthodontic study, we propose a robust hierarchical linear mixed-effects model in which the random effects and the within-subject errors have multivariate t-distributions, with known or unknown degrees-of-freedom, which are allowed to vary with groups of subjects. By using a gamma-normal hierarchical structure, our model allows the identification and classification of both types of outliers, comparing favorably to other multivariate t models for robust estimation in mixed-effects models previously described in the literature, which use only the marginal distribution of the responses. Allowing for unknown degrees-of-freedom, which are estimated from the data, our model provides a balance between robustness and efficiency, leading to reliable results for valid inference. We describe and compare efficient EM-type algorithms, including ECM, ECME, and PX-EM, for maximum likelihood estimation in the robust multivariate t model. We compare the performance of the Gaussian and the multivariatet models under different patterns of outliers. Simulation results indicate that the multivariate t substantially outperforms the Gaussian model when outliers are present in the data, even in moderate amounts.

[1]  Adrian F. M. Smith,et al.  Bayesian Analysis of Linear and Non‐Linear Population Models by Using the Gibbs Sampler , 1994 .

[2]  Vic Barnett,et al.  Outliers in Statistical Data , 1980 .

[3]  Gwilym M. Jenkins,et al.  Time series analysis, forecasting and control , 1972 .

[4]  Jane F. Pendergast,et al.  Robust estimation in growth curve models , 1985 .

[5]  Alice Richardson,et al.  13 Approaches to the robust estimation of mixed models , 1997 .

[6]  R. Jennrich,et al.  Unbalanced repeated-measures models with structured covariance matrices. , 1986, Biometrics.

[7]  D. Bates,et al.  Nonlinear mixed effects models for repeated measures data. , 1990, Biometrics.

[8]  Jeremy MG Taylor,et al.  Robust Statistical Modeling Using the t Distribution , 1989 .

[9]  Calyampudi R. Rao Handbook of statistics , 1980 .

[10]  C. Jennison,et al.  Robust Statistics: The Approach Based on Influence Functions , 1987 .

[11]  D. Bates,et al.  Newton-Raphson and EM Algorithms for Linear Mixed-Effects Models for Repeated-Measures Data , 1988 .

[12]  Richard A. Becker,et al.  The Visual Design and Control of Trellis Display , 1996 .

[13]  Ronald A. Thisted,et al.  Elements of statistical computing , 1986 .

[14]  New York Dover,et al.  ON THE CONVERGENCE PROPERTIES OF THE EM ALGORITHM , 1983 .

[15]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[16]  David E. Tyler,et al.  A curious likelihood identity for the multivariate t-distribution , 1994 .

[17]  J. Ware,et al.  Random-effects models for longitudinal data. , 1982, Biometrics.

[18]  R. Potthoff,et al.  A generalized multivariate analysis of variance model useful especially for growth curve problems , 1964 .

[19]  G. Milliken Nonlinear Regression Analysis and Its Applications , 1990 .

[20]  Donald B. Rubin,et al.  Max-imum Likelihood from Incomplete Data , 1972 .

[21]  Xiao-Li Meng,et al.  The EM Algorithm—an Old Folk‐song Sung to a Fast New Tune , 1997 .

[22]  V. Carey,et al.  Mixed-Effects Models in S and S-Plus , 2001 .

[23]  Gelfand,et al.  AD-A 254 769 BAYESIAN ANALYSIS OF LINEAR AND NONLINEAR POPULATION MODELS USING THE GIBBS SAMPLER , 2022 .

[24]  Shizuhiko Nishisato,et al.  Elements of Dual Scaling: An Introduction To Practical Data Analysis , 1993 .

[25]  W. R. Buckland,et al.  Distributions in Statistics: Continuous Multivariate Distributions , 1974 .

[26]  H. Hartley,et al.  Maximum-likelihood estimation for the mixed analysis of variance model. , 1967, Biometrika.

[27]  Alice Richardson,et al.  Bounded Influence Estimation in the Mixed Linear Model , 1997 .

[28]  Douglas M. Bates,et al.  Nonlinear Regression Analysis and Its Applications , 1988 .

[29]  Xiao-Li Meng,et al.  Maximum likelihood estimation via the ECM algorithm: A general framework , 1993 .

[30]  D. Rubin,et al.  The ECME algorithm: A simple extension of EM and ECM with faster monotone convergence , 1994 .

[31]  K. Fang,et al.  Generalized Multivariate Analysis , 1990 .

[32]  W. R. Buckland,et al.  Distributions in Statistics: Continuous Multivariate Distributions , 1973 .

[33]  A. Welsh,et al.  Robust Restricted Maximum Likelihood in Mixed Linear Models , 1995 .

[34]  Werner A. Stahel,et al.  Robust Statistics: The Approach Based on Influence Functions , 1987 .

[35]  J. Miller,et al.  Asymptotic Properties of Maximum Likelihood Estimates in the Mixed Model of the Analysis of Variance , 1977 .

[36]  Xiao-Li Meng,et al.  Fast EM‐type implementations for mixed effects models , 1998 .

[37]  D. Rubin,et al.  Parameter expansion to accelerate EM : The PX-EM algorithm , 1997 .