influence.ME: Tools for Detecting Influential Data in Mixed Effects Models

influence.ME provides tools for de- tecting influential data in mixed effects mod- els. The application of these models has become common practice, but the development of diag- nostic tools has lagged behind. influence.ME calculates standardized measures of influential data for the point estimates of generalized mixed effects models, such as DFBETAS, Cook's dis- tance, as well as percentile change and a test for changing levels of significance. influence.ME calculates these measures of influence while ac- counting for the nesting structure of the data. The package and measures of influential data are introduced, a practical example is given, and strategies for dealing with influential data are suggested. The application of mixed effects regression models has become common practice in the field of social sci- ences. As used in the social sciences, mixed effects re- gression models take into account that observations on individual respondents are nested within higher- level groups such as schools, classrooms, states, and countries (Snijders and Bosker, 1999), and are often referred to as multilevel regression models. Despite these models' increasing popularity, diagnostic tools to evaluate fitted models lag behind. We introduce influence.ME (Nieuwenhuis, Pelzer, and te Grotenhuis, 2012), an R-package that provides tools for detecting influential cases in mixed effects regression models estimated with lme4 (Bates and Maechler, 2010). It is commonly accepted that tests for influential data should be performed on regression models, especially when estimates are based on a relatively small number of cases. How- ever, most existing procedures do not account for the nesting structure of the data. As a result, these existing procedures fail to detect that higher-level cases may be influential on estimates of variables measured at specifically that level. In this paper, we outline the basic rationale on de- tecting influential data, describe standardized mea- sures of influence, provide a practical example of the analysis of students in 23 schools, and discuss strate- gies for dealing with influential cases. Testing for influential cases in mixed effects regression models is important, because influential data negatively in- fluence the statistical fit and generalizability of the model. In social science applications of mixed mod- els the testing for influential data is especially im- portant, since these models are frequently based on large numbers of observations at the individual level while the number of higher level groups is relatively small. For instance, Van der Meer, te Grotenhuis, and Pelzer (2010) were unable to find any country-level comparative studies involving more than 54 coun- tries. With such a relatively low number of coun- tries, a single country can easily be overly influen- tial on the parameter estimates of one or more of the country-level variables.

[1]  J. Berkhof,et al.  Diagnostic Checks for Multilevel Models , 2008 .

[2]  Yvonne Freeh,et al.  An R and S–PLUS Companion to Applied Regression , 2004 .

[3]  Ben Pelzer,et al.  Influential Cases in Multilevel Modeling: A Methodological Comment , 2010 .

[4]  Jan de Leeuw,et al.  Introducing Multilevel Modeling , 1998 .

[5]  T. Lewis,et al.  Outliers in multilevel data , 1998 .

[6]  Deepayan Sarkar,et al.  Lattice: Multivariate Data Visualization with R , 2008 .

[7]  Michael J. Crawley,et al.  The R book , 2022 .

[8]  Roel Bosker,et al.  Multilevel analysis : an introduction to basic and advanced multilevel modeling , 1999 .

[9]  A. Hossain,et al.  A comparative study on detection of influential observations in linear regression , 1991 .

[10]  Evan S. Lieberman,et al.  Nested Analysis as a Mixed-Method Strategy for Comparative Research , 2005, American Political Science Review.

[11]  Shuangzhe Liu,et al.  Regression diagnostics , 2020, Applied Quantitative Analysis for Real Estate.

[12]  W. W. Muir,et al.  Regression Diagnostics: Identifying Influential Data and Sources of Collinearity , 1980 .

[13]  R. Cook Detection of influential observation in linear regression , 2000 .

[14]  Per Kragh Andersen,et al.  Regression Modeling Strategies with Applications to Linear Models, Logistic Regression and Survival Analysis. Frank E. Harrell, Jun, Springer‐Verlag, New York, 2001. No. of pages: 568. ISBN 0‐387‐95232‐2 , 2003 .

[15]  Sunil J Rao,et al.  Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and Survival Analysis , 2003 .