Longitudinal multiple imputation approaches for body mass index or other variables with very low individual-level variability: the mibmi command in Stata

BackgroundIn modern health care systems, the computerization of all aspects of clinical care has led to the development of large data repositories. For example, in the UK, large primary care databases hold millions of electronic medical records, with detailed information on diagnoses, treatments, outcomes and consultations. Careful analyses of these observational datasets of routinely collected data can complement evidence from clinical trials or even answer research questions that cannot been addressed in an experimental setting. However, ‘missingness’ is a common problem for routinely collected data, especially for biological parameters over time. Absence of complete data for the whole of a individual’s study period is a potential bias risk and standard complete-case approaches may lead to biased estimates. However, the structure of the data values makes standard cross-sectional multiple-imputation approaches unsuitable. In this paper we propose and evaluate mibmi, a new command for cleaning and imputing longitudinal body mass index data.ResultsThe regression-based data cleaning aspects of the algorithm can be useful when researchers analyze messy longitudinal data. Although the multiple imputation algorithm is computationally expensive, it performed similarly or even better to existing alternatives, when interpolating observations.ConclusionThe mibmi algorithm can be a useful tool for analyzing longitudinal body mass index data, or other longitudinal data with very low individual-level variability.

[1]  Vic Hasselblad,et al.  Can one assess whether missing data are missing at random in medical studies? , 2006, Statistical methods in medical research.

[2]  A Rogier T Donders,et al.  Dealing with missing outcome data in randomized trials and observational studies. , 2012, American journal of epidemiology.

[3]  Paula Diehr,et al.  Imputation of missing longitudinal data: a comparison of methods. , 2003, Journal of clinical epidemiology.

[4]  I. White,et al.  Two‐stage method to remove population‐ and individual‐level outliers from longitudinal data in a primary care database , 2012, Pharmacoepidemiology and drug safety.

[5]  Irene Petersen,et al.  Application of Multiple Imputation using the Two-Fold Fully Conditional Specification Algorithm in Longitudinal Clinical Data , 2014, The Stata journal.

[6]  Ian R White,et al.  Evaluation of two-fold fully conditional specification multiple imputation for longitudinal electronic health record data , 2014, Statistics in medicine.

[7]  M. Kenward,et al.  A comparison of multiple imputation and doubly robust estimation for analyses with missing data , 2006 .

[8]  C. Saha,et al.  Bias in the last observation carried forward method under informative dropout , 2009 .

[9]  Evangelos Kontopantelis,et al.  Regression based quasi-experimental approach when randomisation is not an option: interrupted time series analysis , 2015, BMJ : British Medical Journal.

[10]  D. Rubin Multiple Imputation After 18+ Years , 1996 .

[11]  Jürgen Unützer,et al.  A comparison of imputation methods in a longitudinal randomized clinical trial , 2005, Statistics in medicine.

[12]  Jaakko Nevalainen,et al.  Missing values in longitudinal dietary data: A multiple imputation approach based on a fully conditional specification , 2009, Statistics in medicine.

[13]  J. Schafer,et al.  Missing data: our view of the state of the art. , 2002, Psychological methods.

[14]  John B Carlin,et al.  Recovery of information from multiple imputation: a simulation study , 2012, Emerging Themes in Epidemiology.

[15]  S. Silverman,et al.  From randomized controlled trials to observational studies. , 2009, The American journal of medicine.

[16]  Evangelos Kontopantelis,et al.  Glucose, blood pressure and cholesterol levels and their relationships to clinical outcomes in type 2 diabetes: a retrospective cohort study , 2015, Diabetologia.