Missing Data Imputation Approach Based on Incomplete Data Clustering

Missing data processing is an important problem of data pre-processing in data mining field.Traditional missing data filling methods are mostly based on some statistical hypothesis,such as probability distribution,which might not be the most applicable approaches for data mining of large data set.Inspired by ROUSTIDA,an incomplete data analysis approach not using probability statistical methods,MIBOI is proposed for missing data imputation based on incomplete data clustering.Constraint Tolerance Set Dissimilarity is defined for incomplete data set of categorical variables,so the general dissimilarity of all the incomplete data objects in a set can be directly computed,and the missing data is imputed according to the incomplete data clustering results.The empirical tests using UCI benchmark data sets show that MIBOI is effective and feasible.