Tukey proposed a class of distributions, the g-and-h family (gh family), based on a transformation of a standard normal variable to accommodate different skewness and elongation in the distribution of variables arising in practical applications. It is easy to draw values from this distribution even though it is hard to explicitly state the probability density function. Given this flexibility, the gh family may be extremely useful in creating multiple imputations for missing data. This article demonstrates how this family, as well as its generalizations, can be used in the multiple imputation analysis of incomplete data. The focus of this article is on a scalar variable with missing values. In the absence of any additional information, data are missing completely at random, and hence the correct analysis is the complete-case analysis. Thus, the application of the gh multiple imputation to the scalar cases affords comparison with the correct analysis and with other model-based multiple imputation methods. Comparisons are made using simulated datasets and the data from a survey of adolescents ascertaining driving after drinking alcohol.
[1]
Jean T Shope,et al.
Examining trajectories of adolescent risk factors as predictors of subsequent high-risk driving behavior.
,
2003,
The Journal of adolescent health : official publication of the Society for Adolescent Medicine.
[2]
David E. Booth,et al.
Analysis of Incomplete Multivariate Data
,
2000,
Technometrics.
[3]
D. Rubin.
Multiple imputation for nonresponse in surveys
,
1989
.
[4]
Yulei He.
Multiple imputation for continuous non-normal missing data.
,
2005
.
[5]
Martinez Jorge,et al.
Some properties of the tukey g and h family of distributions
,
1984
.