Missing data: implications for analysis.

Missing data are a ubiquitous problem that complicates he statistical analysis of data arising from studies in nutriion. The reasons why missing data are so problematic are at east two-fold. First, standard statistical techniques (e.g., t ests, chi-square tests, linear regression) assume that all ubjects have complete information on all the relevant varibles involved in the analysis. Indeed, as many of our eaders can confirm, the standard presentation of statistical ethods in introductory and intermediate-level courses in tatistics implicitly assumes there are no missing data, hereby conveniently sweeping this problem under the proerbial rug. The second and closely related reason why issing data are problematic has to do with how they are andled in analyses implemented by many standard statisical software packages. With few exceptions, the default ption for handling missing data in most statistical software rograms is to exclude them entirely from the analysis. That s, only individuals with complete information on the releant variables are included in the analysis, with all others eing excluded. This default option is commonly referred to s “listwise deletion” or “casewise deletion” and the subseuent analysis is sometimes referred to as a “complete case nalysis.” On the surface, this default option is remarkably imple and has the apparently desirable effect of producing reduced dataset that is ostensibly free of the problems of issing data and therefore amenable to analysis using conentional techniques. However, there are two direct consequences of listwise eletion that are problematic. First, listwise deletion can esult in a very significant loss of information. For example, uppose that the analysis of interest involves a regression nalysis with 10 predictor variables and each of these preictors has a relatively small probability of being missing, ay 5% chance of being missing. Furthermore, if the hances that data on one of the predictors are missing is