Comparison Method for Handling Missing Data in Clinical Studies

Missing data is an issue that cannot be avoided. Most data mining algorithms cannot work with data that consist of missing values. Complete case analysis, single imputation, multiple imputations, and kNN imputation are some methods that can be used to handle the missing data. Each method has is own advantages and disadvantages. This paper compares of these methods using datasets in clinical studies, chronic kidney disease, Indian Pima diabetes, thyroid, and hepatitis. The accuracy of each method was compared using several classifiers. The experimental results show that kNN imputation method provides better accuracy than other methods.

[1]  Lei Lei,et al.  A Review of Missing Data Treatment Methods , 2005 .

[2]  Stef van Buuren,et al.  Flexible Imputation of Missing Data , 2012 .

[3]  A. Hubbard,et al.  Efficacy studies of malaria treatments in Africa: efficient estimation with missing indicators of failure , 2008, Statistical methods in medical research.

[4]  Roderick J. A. Little,et al.  Statistical Analysis with Missing Data , 1988 .

[5]  Muhamad Saiful Bahri Yusoff,et al.  Missing values in data analysis: Ignore or Impute? , 2011 .

[6]  Jacky W. Keung,et al.  Cross-validation based K nearest neighbor imputation for software quality datasets: An empirical study , 2017, J. Syst. Softw..

[7]  Julian Williams,et al.  Handling missing data: analysis of a challenging data set using multiple imputation , 2016 .

[8]  Benjamin M. Marlin,et al.  Missing Data Problems in Machine Learning , 2008 .

[9]  Zhuo Yang,et al.  Influence analysis of Github repositories , 2016, SpringerPlus.

[10]  Ke Lu,et al.  Missing data imputation by K nearest neighbours based on grey relational structure and mutual information , 2015, Applied Intelligence.

[11]  Samantha Ferreira Morais,et al.  Dealing with Missing Data: an application in the study of family history of hypertension , 2013 .

[12]  Lukasz A. Kurgan,et al.  Impact of imputation of missing values on classification error for discrete data , 2008, Pattern Recognit..

[13]  Sathit Prasomphan,et al.  Generating Prediction Map for Geostatistical Data Based on an Adaptive Neural Network Using only Nearest Neighbors , 2013 .

[14]  Roderick J. A. Little,et al.  Statistical Analysis with Missing Data: Little/Statistical Analysis with Missing Data , 2002 .

[15]  C. Y. Peng,et al.  Principled missing data methods for researchers , 2013, SpringerPlus.

[16]  Gerhard Tutz,et al.  Improved methods for the imputation of missing data by nearest neighbor methods , 2015, Comput. Stat. Data Anal..

[17]  R. Monson,et al.  Gap-filling missing data in eddy covariance measurements using multiple imputation (MI) for annual estimations , 2004 .

[18]  Michel Verleysen,et al.  K nearest neighbours with mutual information for simultaneous classification and missing data imputation , 2009, Neurocomputing.

[19]  William A Ghali,et al.  Multiple imputation versus data enhancement for dealing with missing data in observational health care outcome analyses. , 2002, Journal of clinical epidemiology.

[20]  Jehanzeb R. Cheema A Review of Missing Data Handling Methods in Education Research , 2014 .

[21]  D. Rubin,et al.  Statistical Analysis with Missing Data. , 1989 .

[22]  Chih-Fong Tsai,et al.  The distance function effect on k-nearest neighbor classification for medical datasets , 2016, SpringerPlus.

[23]  J. Schafer,et al.  Missing data: our view of the state of the art. , 2002, Psychological methods.

[24]  Nicola Sartori,et al.  Multiple imputation of missing values in a cancer mortality analysis with estimated exposure dose , 2005, Comput. Stat. Data Anal..

[25]  Rajesh Jugulum,et al.  Importance of Data Quality for Analytics , 2016 .

[26]  Nikos Tsikriktsis,et al.  A review of techniques for treating missing data in OM survey research , 2005 .

[27]  Ms.R Malarvizhi K-NN Classifier Performs Better Than K-Means Clustering in Missing Value Imputation , 2012 .

[28]  Chih-Fong Tsai,et al.  A class center based approach for missing value imputation , 2018, Knowl. Based Syst..

[29]  J. Graham,et al.  Missing data analysis: making it work in the real world. , 2009, Annual review of psychology.

[30]  Pedro Abreu,et al.  Missing data imputation on the 5-year survival prediction of breast cancer patients with unknown discrete values , 2015, Comput. Biol. Medicine.

[31]  Kridanto Surendro,et al.  Missing Data Problem in Predictive Analytics , 2019, ICSCA.

[32]  Da Ruan,et al.  A Cumulative Belief Degree-Based Approach for Missing Values in Nuclear Safeguards Evaluation , 2011, IEEE Transactions on Knowledge and Data Engineering.

[33]  Robin Singh Bhadoria,et al.  Predictive analytics in data science for business intelligence solutions , 2017, 2017 7th International Conference on Communication Systems and Network Technologies (CSNT).

[34]  T. Stijnen,et al.  Review: a gentle introduction to imputation of missing values. , 2006, Journal of clinical epidemiology.

[35]  Jun Liu,et al.  Using Collaborative Filtering for Dealing with Missing Values in Nuclear Safeguards Evaluation , 2009, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[36]  Grigorios Papageorgiou,et al.  Statistical primer: how to deal with missing data in scientific research? , 2018, Interactive cardiovascular and thoracic surgery.

[37]  Lorenzo Beretta,et al.  Nearest neighbor imputation algorithms: a critical evaluation , 2016, BMC Medical Informatics and Decision Making.

[38]  Lynne E. Parker,et al.  Nearest neighbor imputation using spatial-temporal correlations in wireless sensor networks , 2014, Inf. Fusion.

[39]  Graham K. Rand,et al.  Quantitative Applications in the Social Sciences , 1983 .

[40]  Michael G Kenward,et al.  Multiple imputation: current perspectives , 2007, Statistical methods in medical research.

[41]  Nicole A. Lazar,et al.  Statistical Analysis With Missing Data , 2003, Technometrics.