Missing value imputation using unsupervised machine learning techniques

In data mining, preprocessing is one of the essential processes which involves data normalization, noise removal, handling missing values, etc. This paper focuses on handling missing values using unsupervised machine learning techniques. Soft computation approaches are combined with the clustering techniques to form a novel method to handle the missing values, which help us to overcome the problems of inconsistency. Rough K-means centroid-based imputation method is proposed and compared with K-means centroid-based imputation method, fuzzy C-means centroid-based imputation method, K-means parameter-based imputation method, fuzzy C-means parameter-based imputation method, and rough K-means parameter-based imputation methods. The experimental analysis is carried out on four benchmark datasets, viz. Dermatology, Pima, Wisconsin, and Yeast datasets, which have taken from UCI data repository. The proposed method proves the efficacy of different datasets, and the results are also promising one.

[1]  Martin Lampart,et al.  A Partitive Rough Clustering Algorithm , 2006, RSCTC.

[2]  James C. Bezdek,et al.  Fuzzy c-means clustering of incomplete data , 2001, IEEE Trans. Syst. Man Cybern. Part B.

[3]  Pilar Rey-del-Castillo,et al.  Fuzzy min–max neural networks for categorical data: application to missing data imputation , 2012 .

[4]  Jitender S. Deogun,et al.  Towards Missing Data Imputation: A Study of Fuzzy K-means Clustering Method , 2004, Rough Sets and Current Trends in Computing.

[5]  Jitender S. Deogun,et al.  Dealing with Missing Data: Algorithms Based on Fuzzy Set and Rough Set Theories , 2005, Trans. Rough Sets.

[6]  Georg Peters,et al.  Some refinements of rough k-means clustering , 2006, Pattern Recognit..

[7]  Darryl N. Davis,et al.  Machine Learning-Based Missing Value Imputation Method for Clinical Datasets , 2013 .

[8]  Shehroz S. Khan,et al.  Cluster center initialization algorithm for K-means clustering , 2004, Pattern Recognit. Lett..

[9]  Joshua Zhexue Huang,et al.  Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values , 1998, Data Mining and Knowledge Discovery.

[10]  Md Zahidul Islam,et al.  Missing value imputation using a fuzzy clustering-based EM approach , 2015, Knowledge and Information Systems.

[11]  Subhagata Chattopadhyay,et al.  Comparing Fuzzy-C Means and K-Means Clustering Techniques: A Comprehensive Study , 2012 .

[12]  James C. Bezdek,et al.  Efficient Implementation of the Fuzzy c-Means Clustering Algorithms , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Fernando A. Crespo,et al.  An Illustrative Comparison of Rough k-Means to Classical Clustering Approaches , 2013, RSFDGrC.

[14]  Richard Weber,et al.  Evolutionary Rough k-Medoid Clustering , 2008, Trans. Rough Sets.

[15]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[16]  Tero Aittokallio,et al.  Missing value imputation improves clustering and interpretation of gene expression microarray data , 2008, BMC Bioinformatics.

[17]  Durga Toshniwal,et al.  Missing Value Imputation Method Based on Clustering and Nearest Neighbours , 2012 .

[18]  Marimuthu Palaniswami,et al.  Fuzzy c-Means Algorithms for Very Large Data , 2012, IEEE Transactions on Fuzzy Systems.

[19]  K. Thangavel,et al.  Soft Clustering Based Missing Value Imputation , 2016 .

[20]  Aníbal R. Figueiras-Vidal,et al.  Pattern classification with missing data: a review , 2010, Neural Computing and Applications.

[21]  Georg Peters,et al.  Outliers in Rough k-Means Clustering , 2005, PReMI.

[22]  Nambiraj Suguna,et al.  Predicting Missing Attribute Values Using k-Means Clustering , 2011 .

[23]  Quan Pan,et al.  Adaptive imputation of missing values for incomplete pattern classification , 2016, Pattern Recognit..

[24]  Anil K. Jain Data clustering: 50 years beyond K-means , 2010, Pattern Recognit. Lett..

[25]  Zdzislaw Pawlak,et al.  Rough Set Theory and its Applications to Data Analysis , 1998, Cybern. Syst..

[26]  J. C. Dunn,et al.  A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters , 1973 .

[27]  Chengqi Zhang,et al.  Missing Value Imputation Based on Data Clustering , 2008, Trans. Comput. Sci..

[28]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .