Clustering-Based Multiple Imputation via Gray Relational Analysis for Missing Data and Its Application to Aerospace Field

A large number of scientific researches and industrial applications commonly suffer from missing data. Some inappropriate techniques of missing value treatment compromise data quality, which detrimentally influences the knowledge discovery. In this paper, we propose a missing data completion method named CBGMI. Firstly, it separates the nonmissing data instances into several clusters by excluding the missing-valued entries. Then, it utilizes the entropy of the proximal category for each incomplete instance in terms of the similarity metric based on gray relational analysis. Experiments on UCI datasets and aerospace datasets demonstrate that the superiority of our algorithm to other approaches on validity.

[1]  P. Meesad,et al.  Combination of KNN-Based Feature Selection and KNNBased Missing-Value Imputation of Microarray Data , 2008, 2008 3rd International Conference on Innovative Computing Information and Control.

[2]  Witold Pedrycz,et al.  Experimental analysis of methods for imputation of missing values in databases , 2004, SPIE Defense + Commercial Sensing.

[3]  Craig K. Enders,et al.  An introduction to modern missing data analyses. , 2010, Journal of school psychology.

[4]  Chi-Chun Huang,et al.  A Grey-Based Nearest Neighbor Approach for Missing Attribute Value Prediction , 2004, Applied Intelligence.

[5]  Chengqi Zhang,et al.  POP algorithm: Kernel-based imputation to treat missing values in knowledge discovery from databases , 2009, Expert Syst. Appl..

[6]  Geoffrey J. McLachlan,et al.  Analyzing Microarray Gene Expression Data , 2004 .

[7]  Roderick J. A. Little,et al.  Statistical Analysis with Missing Data , 1988 .

[8]  Graham K. Rand,et al.  Quantitative Applications in the Social Sciences , 1983 .

[9]  Jitender S. Deogun,et al.  Towards Missing Data Imputation: A Study of Fuzzy K-means Clustering Method , 2004, Rough Sets and Current Trends in Computing.

[10]  Harri Niska,et al.  Methods for imputation of missing values in air quality data sets , 2004 .

[11]  D. Altman,et al.  Missing data , 2007, BMJ : British Medical Journal.

[12]  Xiaofeng Zhu,et al.  Missing data imputation by utilizing information within incomplete instances , 2011, J. Syst. Softw..

[13]  E. Jaynes Information Theory and Statistical Mechanics , 1957 .

[14]  David E. Booth,et al.  Analysis of Incomplete Multivariate Data , 2000, Technometrics.

[15]  Shichao Zhang,et al.  Clustering-based Missing Value Imputation for Data Preprocessing , 2006, 2006 4th IEEE International Conference on Industrial Informatics.

[16]  T. Stijnen,et al.  Review: a gentle introduction to imputation of missing values. , 2006, Journal of clinical epidemiology.

[17]  Alessandro G. Di Nuovo,et al.  Missing data analysis with fuzzy C-Means: A study of its application in a psychological scenario , 2011, Expert Syst. Appl..

[18]  Marco Di Zio,et al.  Imputation through finite Gaussian mixture models , 2007, Comput. Stat. Data Anal..

[19]  Bhekisipho Twala,et al.  AN EMPIRICAL COMPARISON OF TECHNIQUES FOR HANDLING INCOMPLETE DATA USING DECISION TREES , 2009, Appl. Artif. Intell..

[20]  James M. Keller,et al.  Fuzzy Models and Algorithms for Pattern Recognition and Image Processing , 1999 .

[21]  Jennifer Dixon,et al.  Modern Alternatives for Dealing with Missing Data in Special Education Research , 2006 .

[22]  James C. Bezdek,et al.  Fuzzy c-means clustering of incomplete data , 2001, IEEE Trans. Syst. Man Cybern. Part B.

[23]  Shyi-Ming Chen,et al.  ESTIMATING NULL VALUES IN THE DISTRIBUTED RELATIONAL DATABASES ENVIRONMENT , 2000, Cybern. Syst..

[24]  Xiaofeng Zhu,et al.  Missing Data Analysis: A Kernel-Based Multi-Imputation Approach , 2009, Trans. Comput. Sci..

[25]  María del Mar Rueda,et al.  New imputation methods for missing data using quantiles , 2009, J. Comput. Appl. Math..

[26]  Michel Verleysen,et al.  K nearest neighbours with mutual information for simultaneous classification and missing data imputation , 2009, Neurocomputing.

[27]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[28]  M. Aldenderfer,et al.  Cluster Analysis. Sage University Paper Series On Quantitative Applications in the Social Sciences 07-044 , 1984 .

[29]  Estevam R. Hruschka,et al.  A Bayesian imputation method for a clustering genetic algorithm , 2011, J. Comput. Methods Sci. Eng..

[30]  Panos Liatsis,et al.  A robust missing value imputation method for noisy data , 2010, Applied Intelligence.

[31]  Shyi-Ming Chen,et al.  Generating weighted fuzzy rules from relational database systems for estimating values using genetic algorithms , 2003, IEEE Trans. Fuzzy Syst..