A Simplified Framework for Using Multiple Imputation in Social Work Research
暂无分享,去创建一个
Missing data are nearly always a problem in research, and missing values represent a serious threat to the validity of inferences drawn from findings. Increasingly, social science researchers are turning to multiple imputation to handle missing data. Multiple imputation, in which missing values are replaced by values repeatedly drawn from conditional probability distributions, is an appropriate method for handling missing data when values are not missing completely at random. However, use of this method requires developing an imputation model from the observed data. This is typically a rigorous and time-consuming process. To encourage wider adoption of multiple imputation in social work research, a simple framework for designing imputation models is presented. The framework and its ability to generate unbiased estimates are demonstrated in a simulation study. KEY WORDS: missing data; multiple imputation; nonresponse ********** Missing data are ubiquitous in social research, and missingness or nonresponse can represent a threat to the validity of inferences because of undue effects on efficiency, power, and parameter bias (Shadish, Cook, & Campbell, 2002). Social work researchers are now addressing missing data in a more rigorous manner. Recently, Saunders et al. (2006) and Choi, Golder, Gillmore, and Morrison (2005) described important data imputation methods and dispelled misunderstandings regarding popular imputation methods, such as mean substitution. Recent advances in analytic methods, such as multiple imputation (MI), are taking hold in social work research. With MI, missing values are replaced with values repeatedly drawn from simulated conditional probability distributions (Schafer, 1997), thus creating multiple versions of the data set. Each version of the data set is analyzed according to the data analysis model, and the multiple results are combined into point estimates (Rubin, 1996). A critical task in MI is to devise an imputation model (Allison, 2002) or missing data model (Graham, Olchowski, & Gilreath, 2007), which involves specifying the measures that are putatively associated with the missing values. Although this process adds additional steps, the specification of an imputation model and the creation of multiple data sets can produce less-biased estimates in the presence of missing data across a wide variety of data analysis techniques (Schafer, 1997). Besides MI, there are many other methods for addressing missing data (Schafer, 1999; Schafer & Graham, 2002). An equally rigorous method known as direct or full information maximum likelihood (FIML) estimation can produce unbiased estimates and correct standard errors in the presence of missing data. When the number of imputations is sufficiently large, identical missing data models will produce the same estimates under MI and FIML (Graham et al., 2007). Unlike MI, FIML is limited to maximum likelihood analytic techniques and the missing data model must be included in the analysis model. Although we focus on MI, the steps we describe for developing an imputation model are equally appropriate for use in the missing data model for FIML. On the basis of the MI literature, this article describes a framework for developing an imputation model for use with any free or commercial software package that performs MI. BRIEF REVIEW OF MISSING DATA CONCEPTS Generally, both MI and a broad range of missing data issues have received ample attention in the applied literature (Graham et al., 2007). We briefly discuss the three types of distributions that describe the randomness of nonresponse given that this property has consequences for the development of an imputation model. For a discussion of general missing data concepts that are not critical to understanding our discussion of imputation model development, we refer readers to Schafer and Graham (2002). Distribution of Nonresponse The probability distribution of nonresponse--more frequently referred to as the missing data or nonresponse mechanism (Rubin, 1976)--is both an important factor in the decision to impute with MI and a context for the development of an imputation model. …