Application of Multiple Imputation using the Two-Fold Fully Conditional Specification Algorithm in Longitudinal Clinical Data

Electronic health records of longitudinal clinical data are a valuable resource for health care research. One obstacle of using databases of health records in epidemiological analyses is that general practitioners mainly record data if they are clinically relevant. We can use existing methods to handle missing data, such as multiple imputation (MI), if we treat the unavailability of measurements as a missing-data problem. Most software implementations of MI do not take account of the longitudinal and dynamic structure of the data and are difficult to implement in large databases with millions of individuals and long follow-up. Nevalainen, Kenward, and Virtanen (2009, Statistics in Medicine 28: 3657–3669) proposed the two-fold fully conditional specification algorithm to impute missing data in longitudinal data. It imputes missing values at a given time point, conditional on information at the same time point and immediately adjacent time points. In this article, we describe a new command, twofold, that implements the two-fold fully conditional specification algorithm. It is extended to accommodate MI of longitudinal clinical records in large databases.

[1]  M. Kenward,et al.  Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls , 2009, BMJ : British Medical Journal.

[2]  M. Ashworth,et al.  Quality and Outcomes Framework: smoke and mirrors? , 2010, Quality in primary care.

[3]  A. Bourke,et al.  Generalisability of The Health Improvement Network (THIN) database: demographics, chronic disease prevalence and mortality rates. , 2011, Informatics in primary care.

[4]  Jung-Keun Lee,et al.  Validity of the percent reduction in standard deviation outlier test for screening laboratory means from a collaborative study. , 2003, Journal of AOAC International.

[5]  Per Capita,et al.  About the authors , 1995, Machine Vision and Applications.

[6]  Jaakko Nevalainen,et al.  Missing values in longitudinal dietary data: A multiple imputation approach based on a fully conditional specification , 2009, Statistics in medicine.

[7]  H. Boshuizen,et al.  Multiple imputation of missing blood pressure covariates in survival analysis. , 1999, Statistics in medicine.

[8]  Nicole A. Lazar,et al.  Statistical Analysis With Missing Data , 2003, Technometrics.

[9]  Roderick J. A. Little,et al.  Statistical Analysis with Missing Data: Little/Statistical Analysis with Missing Data , 2002 .

[10]  Lena Osterhagen,et al.  Multiple Imputation For Nonresponse In Surveys , 2016 .

[11]  A. Bourke,et al.  Feasibility study and methodology to create a quality-evaluated database of primary care data. , 2004, Informatics in primary care.

[12]  James R Carpenter,et al.  Multiple imputation of covariates by fully conditional specification: Accommodating the substantive model , 2012, Statistical methods in medical research.

[13]  David E. Booth,et al.  Analysis of Incomplete Multivariate Data , 2000, Technometrics.