MissMech: An R Package for Testing Homoscedasticity, Multivariate Normality, and Missing Completely at Random (MCAR)

Researchers are often faced with analyzing data sets that are not complete. To properly analyze such data sets requires the knowledge of the missing data mechanism. If data are missing completely at random (MCAR), then many missing data analysis techniques lead to valid inference. Thus, tests of MCAR are desirable. The package MissMech implements two tests developed by Jamshidian and Jalal (2010) for this purpose. These tests can be run using a function called TestMCARNormality. One of the tests is valid if data are normally distributed, and another test does not require any distributional assumptions for the data. In addition to testing MCAR, in some special cases, the function TestMCARNormality is also able to test whether data have a multivariate normal distribution. As a bonus, the functions in MissMech can also be used for the following additional tasks: (i) test of homoscedasticity for several groups when data are completely observed, (ii) perform the k-sample test of Anderson-Darling to determine whether k groups of univariate data come from the same distribution, (iii) impute incomplete data sets using two methods, one where normality is assumed and one where no specific distributional assumptions are made, (iv) obtain normal-theory maximum likelihood estimates for mean and covariance matrix when data are incomplete, along with their standard errors, and finally (v) perform the Neyman’s test of uniformity. All of these features are explained in the paper, including examples.

[1]  D. Darling,et al.  A Test of Goodness of Fit , 1954 .

[2]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[3]  M. Kendall Statistical Methods for Research Workers , 1937, Nature.

[4]  A. Gelman,et al.  Multiple Imputation with Diagnostics (mi) in R: Opening Windows into the Black Box , 2011 .

[5]  P. Royston A Remark on Algorithm as 181: The W‐Test for Normality , 1995 .

[6]  Douglas M. Hawkins,et al.  A new test for multivariate normality and homoscedasticity , 1981 .

[7]  M. Jamshidian,et al.  Tests of Homoscedasticity, Normality, and Missing Completely at Random for Incomplete Multivariate Data , 2010, Psychometrika.

[8]  P. Bentler,et al.  ML Estimation of Mean and Covariance Structures with Missing Data Using Complete Data Routines , 1999 .

[9]  R. Fisher,et al.  Statistical Methods for Research Workers , 1930, Nature.

[10]  Ke-Hai Yuan,et al.  Data-driven sensitivity analysis to detect missing data mechanism with applications to structural equation modelling , 2013 .

[11]  M. Stephens,et al.  K-Sample Anderson–Darling Tests , 1987 .

[12]  Anja Vogler,et al.  An Introduction to Multivariate Statistical Analysis , 2004 .

[13]  Peter M. Bentler,et al.  Tests of homogeneity of means and covariance matrices for multivariate incomplete data , 2002 .

[14]  J. Neyman »Smooth test» for goodness of fit , 1937 .

[15]  G. Box,et al.  A general distribution theory for a class of likelihood criteria. , 1949, Biometrika.

[16]  R. Little A Test of Missing Completely at Random for Multivariate Data with Missing Values , 1988 .

[17]  T. Ledwina Data-Driven Version of Neyman's Smooth Test of Fit , 1994 .

[18]  D. Rubin INFERENCE AND MISSING DATA , 1975 .

[19]  Peter M. Bentler,et al.  EQS : structural equations program manual , 1989 .

[20]  Muni S. Srivastava,et al.  Multiple imputation and other resampling schemes for imputing missing observations , 2009, J. Multivar. Anal..

[21]  Dimensions of control: Mediational analyses of the stress–health relationship , 2007 .

[22]  Gary King,et al.  Amelia II: A Program for Missing Data , 2011 .