Imputation Methods for Longitudinal Data: A Comparative Study

Longitudinal studies play an important role in scientific researches. The defining characteristic of the longitudinal studies is that observations are collected from each subject repeatedly over time, or under different conditions. Missing values are common in longitudinal studies. The presence of missing values is always a fundamental challenge since it produces potential bias, even in well controlled conditions. Three different missing data mechanisms are defined; missing completely at random (MCAR), missing at random (MAR) and missing not at random (MNAR). Several imputation methods have been developed in literature to handle missing values in longitudinal data. The most commonly used imputation methods include complete case analysis (CCA), mean imputation (Mean), last observation carried forward (LOCF), hot deck (HOT), regression imputation (Regress), K-nearest neighbor (KNN), The expectation maximization (EM) algorithm, and multiple imputation (MI). In this article, a comparative study is conducted to investigate the efficiency of these eight imputation methods under different missing data mechanisms. The comparison is conducted through simulation study. It is concluded that the MI method is the most effective method as it has the least standard errors. The EM algorithm has the largest relative bias. The different methods are also compared via real data application.

[1]  Daniel A. Newman Longitudinal Modeling with Randomly and Systematically Missing Data: A Simulation of Ad Hoc, Maximum Likelihood, and Multiple Imputation Techniques , 2003 .

[2]  Stuart R. Lipsitz,et al.  Analysis of longitudinal data with non‐ignorable non‐monotone missing values , 2002 .

[3]  Lena Osterhagen,et al.  Multiple Imputation For Nonresponse In Surveys , 2016 .

[4]  Ingram Olkin,et al.  Incomplete data in sample surveys , 1985 .

[5]  Weiming Ke,et al.  Review of the Methods for Handling Missing Data in Longitudinal Data Analysis , 2011 .

[6]  A Rogier T Donders,et al.  Imputation of missing values is superior to complete case analysis and the missing-indicator method in multivariable diagnostic research: a clinical example. , 2006, Journal of clinical epidemiology.

[7]  Xiaoping Zhu,et al.  Comparison of Four Methods for Handing Missing Data in Longitudinal Data Analysis through a Simulation Study , 2014 .

[8]  Hyunshik Lee,et al.  ESTIMATION OF THE VARIANCE IN THE PRESENCE OF NEAREST NEIGHBOUR IMPUTATION , 2002 .

[9]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[10]  Paula Diehr,et al.  Imputation of missing longitudinal data: a comparison of methods. , 2003, Journal of clinical epidemiology.

[11]  Enola K. Proctor,et al.  Imputing Missing Data: A Comparison of Methods for Social Work Researchers , 2006 .

[12]  Michikazu Nakai Simulation Study: Introduction of Imputation Methods for Missing Data in Longitudinal Analysis , 2011 .

[13]  T. Belin,et al.  Analysis of longitudinal data with missing values , 2006 .

[14]  P. Lane Handling drop‐out in longitudinal clinical trials: a comparison of the LOCF and MMRM approaches , 2008, Pharmaceutical statistics.

[15]  Ahmed M. Gad,et al.  Analysis of longitudinal data with intermittent missing values using the stochastic EM algorithm , 2006, Comput. Stat. Data Anal..

[16]  Sukhdev Mishra,et al.  On comparative performance of multiple imputation methods for moderate to large proportions of missing data in clinical trials: a simulation study , 2014 .

[17]  Nicole A. Lazar,et al.  Statistical Analysis With Missing Data , 2003, Technometrics.

[18]  David L Streiner,et al.  The Case of the Missing Data: Methods of Dealing with Dropouts and other Research Vagaries , 2002, Canadian journal of psychiatry. Revue canadienne de psychiatrie.

[19]  S. Lipsitz,et al.  Missing responses in generalised linear mixed models when the missing data mechanism is nonignorable , 2001 .

[20]  J. Shao,et al.  Nearest Neighbor Imputation for Survey Data , 2000 .

[21]  C. Saha,et al.  Bias in the last observation carried forward method under informative dropout , 2009 .

[22]  K. Nishimura,et al.  Comparative Study of Four Methods in Missing Value Imputations under Missing Completely at Random Mechanism , 2014 .

[23]  Jonathon N. Cummings,et al.  Multiple Imputation for Missing Data: Making the most of What you Know , 2003 .

[24]  Gerard M Schippers,et al.  UvA-DARE ( Digital Academic Repository ) Missing Data Approaches in eHealth Research : Simulation Study and a Tutorial for Nonmathematically Inclined Researchers , 2011 .

[25]  Graham K. Rand,et al.  Quantitative Applications in the Social Sciences , 1983 .

[26]  Jörg Drechsler,et al.  Multiple Imputation for Nonresponse , 2011 .