A Novel Approach for Imputation of Missing Attribute Values for Efficient Mining of Medical Datasets - Class Based Cluster Approach

Missing attribute values are quite common in the datasets available in the literature. Missing values are also possible because all attributes values may not be recorded and hence unavailable due to several practical reasons. For all these one must fix missing attribute vales if the analysis has to be done. Imputation is the first step in analyzing medical datasets. Hence this has achieved significant contribution from several medical domain researchers. Several data mining researchers have proposed various methods and approaches to impute missing values. However very few of them concentrate on dimensionality reduction. In this paper, we discuss a novel imputation framework for missing values imputation. Our approach of filling missing values is rooted on class based clustering approach and essentially aims at medical records dimensionality reduction. We use these dimensionality records for carrying prediction and classification analysis. A case study is discussed which shows how imputation is performed using proposed method.

[1]  Ling Wang,et al.  Modelling method with missing values based on clustering and support vector regression , 2010 .

[2]  Chengqi Zhang,et al.  Missing or absent? A Question in Cost-sensitive Decision Tree , 2006, AMT.

[3]  Ram Akella,et al.  Dynamically Modeling Patient's Health State from Electronic Medical Records: A Time Series Approach , 2015, KDD.

[4]  Hong Yan,et al.  Autoregressive-Model-Based Missing Value Estimation for DNA Microarray Time Series Data , 2009, IEEE Transactions on Information Technology in Biomedicine.

[5]  Shichao Zhang,et al.  "Missing is useful": missing values in cost-sensitive decision trees , 2005, IEEE Transactions on Knowledge and Data Engineering.

[6]  Qiang Yang,et al.  Test-cost sensitive classification on data with missing values , 2006, IEEE Transactions on Knowledge and Data Engineering.

[7]  Zili Zhang,et al.  Missing Value Estimation for Mixed-Attribute Data Sets , 2011, IEEE Transactions on Knowledge and Data Engineering.

[8]  K. Srinivasan,et al.  Missing Value Estimation for Mixed Attribute Data Sets , 2016 .

[9]  Shichao Zhang,et al.  Clustering-based Missing Value Imputation for Data Preprocessing , 2006, 2006 4th IEEE International Conference on Industrial Informatics.

[10]  Chih-Fong Tsai,et al.  CANN: An intrusion detection system based on combining cluster centers and nearest neighbors , 2015, Knowl. Based Syst..

[11]  Bonnie Kirkpatrick,et al.  Perfect Phylogeny Problems with Missing Values , 2014, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[12]  Peter J. Haug,et al.  Exploiting missing clinical data in Bayesian network modeling for predicting medical problems , 2008, J. Biomed. Informatics.

[13]  Witold Pedrycz,et al.  A Novel Framework for Imputation of Missing Values in Databases , 2007, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[14]  Hamid Soltanian-Zadeh,et al.  Effect of classifiers in consensus feature ranking for biomedical datasets , 2010, DTMBIO '10.

[15]  Atif Khan,et al.  Validation of an ontological medical decision support system for patient treatment using a repository of patient data , 2013, ACM Trans. Intell. Syst. Technol..

[16]  X LingCharles,et al.  Missing Is Useful , 2005 .