Multiple imputation for national public-use datasets and its possible application for gestational age in United States Natality files.

Multiple imputation (MI) is a technique that can be used for handling missing data in a public-use dataset. With MI, two or more completed versions of the dataset are created, containing possibly different but reasonable replacements for the missing data. Users analyse the completed datasets separately with standard techniques and then combine the results using simple formulae in a way that allows the extra uncertainty due to missing data to be assessed. An advantage of this approach is that the resulting public-use data can be analysed by a variety of users for a variety of purposes, without each user needing to devise a method to deal with the missing data. A recent example for a large public-use dataset is the MI of the family income and personal earnings variables in the National Health Interview Survey. We propose an approach to utilise MI to handle the problems of missing gestational ages and implausible birthweight-gestational age combinations in national vital statistics datasets. This paper describes MI and gives examples of MI for public-use datasets, summarises methods that have been used for identifying implausible gestational age values on birth records, and combines these ideas by setting forth scenarios for identifying and then imputing missing and implausible gestational age values multiple times. Because missing and implausible gestational age values are not missing completely at random, using multiple imputations and, thus, incorporating both the existing relationships among the variables and the uncertainty added from the imputation, may lead to more valid inferences in some analytical studies than simply excluding birth records with inadequate data.

[1]  C Rouquette,et al.  [Epidemiologic research]. , 1970, Bulletin de l'Institut national de la sante et de la recherche medicale.

[2]  D. Rubin Multiple Imputation After 18+ Years , 1996 .

[3]  G. C. Tiao,et al.  Bayesian inference in statistical analysis , 1973 .

[4]  P. Rantakallio,et al.  Fitting mixture models to birth weight data: a case study. , 1991, Biometrics.

[5]  Donald B. Rubin,et al.  Multiple Imputation of Industry and Occupation Codes in Census Public-use Samples Using Bayesian Logistic Regression , 1991 .

[6]  T. Raghunathan,et al.  Multiple Imputation of Family Income and Personal Earnings in the National Health Interview Survey: Methods and Examples , 2008 .

[7]  T. Raghunathan,et al.  Multiple Imputation of Missing Income Data in the National Health Interview Survey , 2006 .

[8]  A. Wilcox,et al.  Bias in studies of preterm and postterm delivery due to ultrasound assessment of gestational age. , 1995 .

[9]  Søren Feodor Nielsen,et al.  1. Statistical Analysis with Missing Data (2nd edn). Roderick J. Little and Donald B. Rubin, John Wiley & Sons, New York, 2002. No. of pages: xv+381. ISBN: 0‐471‐18386‐5 , 2004 .

[10]  A. Trumble,et al.  Birth weight for gestational age of Mexican American infants born in the United States. , 1999, Obstetrics and Gynecology.

[11]  J. Schafer,et al.  Multiple Edit/Multiple Imputation for Multivariate Continuous Data , 2003 .

[12]  B Barnwell,et al.  SUDAAN User's Manual, Release 7.5, , 1997 .

[13]  J. Parker,et al.  Implications of cleaning gestational age data. , 2002, Paediatric and perinatal epidemiology.

[14]  D. Savitz,et al.  Comparison of pregnancy dating by last menstrual period, ultrasound scanning, and their combination. , 2002, American journal of obstetrics and gynecology.

[15]  Stella M. Yu,et al.  Preterm delivery rates in North Carolina: are they really declining among non-Hispanic African Americans? , 2004, American journal of epidemiology.

[16]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.

[17]  K. Joseph,et al.  Implausible birth weight for gestational age. , 2001, American journal of epidemiology.

[18]  S. Taffel,et al.  A method of imputing length of gestation on birth certificates. , 1982, Vital and health statistics. Series 2, Data evaluation and methods research.

[19]  I. K. Rossavik Fetal growth and the etiology of preterm delivery. , 1995, Obstetrics and gynecology.

[20]  Arthur B. Kennickell,et al.  Imputation of the 1989 Survey of Consumer Finances: Stochastic Relaxation and Multiple Imputation , 1997 .

[21]  Roderick J. A. Little,et al.  Statistical Analysis with Missing Data: Little/Statistical Analysis with Missing Data , 2002 .

[22]  R. Subramanian,et al.  TRANSITIONING TO MULTIPLE IMPUTATION - A NEW METHOD TO ESTIMATE MISSING BLOOD ALCOHOL CONCENTRATION (BAC) VALUES IN FARS , 2002 .

[23]  D B Rubin,et al.  Multiple imputation in health-care databases: an overview and some applications. , 1991, Statistics in medicine.

[24]  Tx Station Stata Statistical Software: Release 7. , 2001 .

[25]  S. Tentoni,et al.  Birthweight by gestational age in preterm babies according to a Gaussian mixture model , 2004, BJOG : an international journal of obstetrics and gynaecology.

[26]  Roderick J. A. Little,et al.  The NHANES III multiple imputation project , 1996 .

[27]  D. Rubin,et al.  Statistical Analysis with Missing Data. , 1989 .

[28]  T. Hulsey,et al.  Discordance between LMP-based and clinically estimated gestational age: implications for research, programs, and policy. , 1995, Public health reports.

[29]  M. Kramer,et al.  How does early ultrasound scan estimation of gestational age lead to higher rates of preterm birth? , 2002, American journal of obstetrics and gynecology.

[30]  G J McLachlan,et al.  Mixture modelling for cluster analysis , 2004, Statistical methods in medical research.

[31]  R. Creasy,et al.  Fetal Growth and Perinatal Viability in California , 1982, Obstetrics and gynecology.

[32]  J. Himes,et al.  A United States National Reference for Fetal Growth , 1996, Obstetrics and gynecology.

[33]  J. Kiely,et al.  What is the population-based risk of preterm birth among twins and other multiples? , 1998, Clinical obstetrics and gynecology.

[34]  M. Abrahamowicz,et al.  Detecting and eliminating erroneous gestational ages: a normal mixture model , 2001, Statistics in medicine.

[35]  R. Little Missing-Data Adjustments in Large Surveys , 1988 .

[36]  A. Wilcox,et al.  Errors in gestational age: evidence of bleeding early in pregnancy. , 1999, American journal of public health.

[37]  M S Kramer,et al.  The validity of gestational age estimation by menstrual dating in term, preterm, and postterm gestations. , 1988, JAMA.

[38]  B. Everitt An introduction to finite mixture distributions , 1996, Statistical methods in medical research.

[39]  M. Overpeck,et al.  Growth and Fatness at Three to Six Years of Age of Children Born Small- or Large-for-Gestational Age , 1999, Pediatrics.

[40]  R. Little,et al.  Editing and Imputation for Quantitative Survey Data , 1987 .

[41]  C. Sitthi-amorn,et al.  Bias , 1993, The Lancet.

[42]  P. Gruenwald Growth of the human fetus. I. Normal growth and its variation. , 1966, American journal of obstetrics and gynecology.

[43]  A. Kennickell,et al.  WHO USE ELECTRONIC BANKING? RESULTS FROM THE 1995 SURVEY OF CONSUMERS’ FINANCES BOARD OF GOVERNORS OF THE FEDERAL RESERVE SYSTEM , 1997 .

[44]  W. Bowes,et al.  Birth‐Weight‐for‐Gestational‐Age Patterns by Race, Sex, and Parity in the United States Population , 1995, Obstetrics and gynecology.

[45]  R W Platt,et al.  A new and improved population-based Canadian reference for birth weight for gestational age. , 2001, Pediatrics.