Missing Value Imputation Approach Using Cosine Similarity Measure

Mining incomplete datasets containing missing values can produce various problems in data mining, particularly with the large-scale dataset. Missing data is one of the major factors affecting the quality of data. Missing value imputation provides optimal solutions to deal with incomplete datasets containing missing values. Although the class center-based approach for imputing missing value provides good results, however, they use Euclidean distance which works on linear distances which generate the linearity in distance: hence making the missing data imputation less accurate. In this paper, missing value imputation approach using cosine similarity measure is used. The proposed model uses the implicit classification label estimation and cosine similarity ratio formulation which provides better results than class center-based imputation approach.

[1]  Roderick J. A. Little,et al.  Statistical Analysis with Missing Data: Little/Statistical Analysis with Missing Data , 2002 .

[2]  Johan A. K. Suykens,et al.  Handling missing values in support vector machine classifiers , 2005, Neural Networks.

[3]  Chengqi Zhang,et al.  Semi-parametric optimization for missing data imputation , 2007, Applied Intelligence.

[4]  Werasak Kurutach,et al.  Cluster-based KNN missing value imputation for DNA microarray data , 2012, 2012 IEEE International Conference on Systems, Man, and Cybernetics (SMC).

[5]  T. Schneider Analysis of Incomplete Climate Data: Estimation of Mean Values and Covariance Matrices and Imputation of Missing Values. , 2001 .

[6]  Wan-Chi Siu,et al.  Iterative bicluster-based least square framework for estimation of missing values in microarray gene expression data , 2012, Pattern Recognit..

[7]  Shichao Zhang,et al.  Parimputation: From Imputation and Null-Imputation to Partially Imputation , 2008, IEEE Intell. Informatics Bull..

[8]  Zili Zhang,et al.  Missing Value Estimation for Mixed-Attribute Data Sets , 2011, IEEE Transactions on Knowledge and Data Engineering.

[9]  Ke Lu,et al.  Missing data imputation by K nearest neighbours based on grey relational structure and mutual information , 2015, Applied Intelligence.

[10]  Chris Cornelis,et al.  Fuzzy-Rough Nearest Neighbour Classification , 2011, Trans. Rough Sets.

[11]  Md Zahidul Islam,et al.  Missing value imputation using decision trees and decision forests by splitting and merging records: Two novel techniques , 2013, Knowl. Based Syst..

[12]  Md Zahidul Islam,et al.  iDMI: A novel technique for missing value imputation using a decision tree and expectation-maximization algorithm , 2014, 16th Int'l Conf. Computer and Information Technology.

[13]  Chih-Fong Tsai,et al.  A class center based approach for missing value imputation , 2018, Knowl. Based Syst..

[14]  Mehran Amiri,et al.  Missing data imputation using fuzzy-rough methods , 2016, Neurocomputing.

[15]  Chih-Fong Tsai,et al.  Combining instance selection for better missing value imputation , 2016, J. Syst. Softw..

[16]  Chengqi Zhang,et al.  POP algorithm: Kernel-based imputation to treat missing values in knowledge discovery from databases , 2009, Expert Syst. Appl..

[17]  Shancang Li,et al.  MIAEC: Missing Data Imputation Based on the Evidence Chain , 2018, IEEE Access.