Data-driven sensitivity analysis to detect missing data mechanism with applications to structural equation modelling

Missing data are a common problem in almost all areas of empirical research. Ignoring the missing data mechanism, especially when data are missing not at random (MNAR), can result in biased and/or inefficient inference. Because MNAR mechanism is not verifiable based on the observed data, sensitivity analysis is often used to assess it. Current sensitivity analysis methods primarily assume a model for the response mechanism in conjunction with a measurement model and examine sensitivity to missing data mechanism via the parameters of the response model. Recently, Jamshidian and Mata (Post-modelling sensitivity analysis to detect the effect of missing data mechanism, Multivariate Behav. Res. 43 (2008), pp. 432–452) introduced a new method of sensitivity analysis that does not require the difficult task of modelling the missing data mechanism. In this method, a single measurement model is fitted to all of the data and to a sub-sample of the data. Discrepancy in the parameter estimates obtained from the the two data sets is used as a measure of sensitivity to missing data mechanism. Jamshidian and Mata describe their method mainly in the context of detecting data that are missing completely at random (MCAR). They used a bootstrap type method, that relies on heuristic input from the researcher, to test for the discrepancy of the parameter estimates. Instead of using bootstrap, the current article obtains confidence interval for parameter differences on two samples based on an asymptotic approximation. Because it does not use bootstrap, the developed procedure avoids likely convergence problems with the bootstrap methods. It does not require heuristic input from the researcher and can be readily implemented in statistical software. The article also discusses methods of obtaining sub-samples that may be used to test missing at random in addition to MCAR. An application of the developed procedure to a real data set, from the first wave of an ongoing longitudinal study on aging, is presented. Simulation studies are performed as well, using two methods of missing data generation, which show promise for the proposed sensitivity method. One method of missing data generation is also new and interesting in its own right.

[1]  Roderick J. A. Little,et al.  Statistical Analysis with Missing Data: Little/Statistical Analysis with Missing Data , 2002 .

[2]  M. Jamshidian,et al.  Tests of Homoscedasticity, Normality, and Missing Completely at Random for Incomplete Multivariate Data , 2010, Psychometrika.

[3]  Gary G. Koch,et al.  Analysis of categorical data , 1985 .

[4]  Nicole A. Lazar,et al.  Statistical Analysis With Missing Data , 2003, Technometrics.

[5]  D. Rubin INFERENCE AND MISSING DATA , 1975 .

[6]  M. Kenward,et al.  Informative Drop‐Out in Longitudinal Data Analysis , 1994 .

[7]  Peter M. Bentler,et al.  Tests of homogeneity of means and covariance matrices for multivariate incomplete data , 2002 .

[8]  A. Boomsma Nonconvergence, improper solutions, and starting values in lisrel maximum likelihood estimation , 1985 .

[9]  P. Bentler,et al.  ML Estimation of Mean and Covariance Structures with Missing Data Using Complete Data Routines , 1999 .

[10]  Bret Larget,et al.  Analysis of Categorical Data , 2002 .

[11]  R. Little A Test of Missing Completely at Random for Multivariate Data with Missing Values , 1988 .

[12]  K. Yuan,et al.  Standard errors in covariance structure models: asymptotics versus bootstrap. , 2006, The British journal of mathematical and statistical psychology.

[13]  Mortaza Jamshidian,et al.  Postmodeling Sensitivity Analysis to Detect the Effect of Missing Data Mechanisms , 2008, Multivariate behavioral research.

[14]  David E. Booth,et al.  Analysis of Incomplete Multivariate Data , 2000, Technometrics.

[15]  T. Hothorn,et al.  Multiple Comparisons Using R , 2010 .

[16]  Scott L. Zeger,et al.  Latent Variable Regression for Multiple Discrete Outcomes , 1997 .

[17]  James C. Anderson,et al.  The effect of sampling error on convergence, improper solutions, and goodness-of-fit indices for maximum likelihood confirmatory factor analysis , 1984 .

[18]  R. Jennrich,et al.  Standard errors for EM estimation , 2000 .

[19]  Douglas M. Hawkins,et al.  A new test for multivariate normality and homoscedasticity , 1981 .

[20]  Dimensions of control: Mediational analyses of the stress–health relationship , 2007 .

[21]  Control and the Elderly: “Goodness-of-Fit” , 1997, International journal of aging & human development.

[22]  Mortaza Jamshidian,et al.  Testing equality of covariance matrices when data are incomplete , 2007, Comput. Stat. Data Anal..