A Global Clustering Approach Using Hybrid Optimization for Incomplete Data Based on Interval Reconstruction of Missing Value

Incomplete data clustering is often encountered in practice. Here the treatment of missing attribute value and the optimization procedure of clustering are the important factors impacting the clustering performance. In this study, a missing attribute value becomes an information granule and is represented as a certain interval. To avoid intervals determined by different cluster information, we propose a congeneric nearest‐neighbor rule‐based architecture of the preclassification result, which can improve the effectiveness of estimation of missing attribute interval. Furthermore, a global fuzzy clustering approach using particle swarm optimization assisted by the Fuzzy C‐Means is proposed. A novel encoding scheme where particles are composed of the cluster prototypes and the missing attribute values is considered in the optimization procedure. The proposed approach improves the accuracy of clustering results, moreover, the missing attribute imputation can be implemented at the same time. The experimental results of several UCI data sets show the efficiency of the proposed approach.

[1]  Taghi M. Khoshgoftaar,et al.  Incomplete-Case Nearest Neighbor Imputation in Software Measurement Data , 2007, 2007 IEEE International Conference on Information Reuse and Integration.

[2]  Qi Huang,et al.  Semi-supervised fuzzy clustering with metric learning and entropy regularization , 2012, Knowl. Based Syst..

[3]  Md Zahidul Islam,et al.  A hybrid clustering technique combining a novel genetic algorithm with K-Means , 2014, Knowl. Based Syst..

[4]  Witold Pedrycz,et al.  A parametric model for fusing heterogeneous fuzzy data , 1996, IEEE Trans. Fuzzy Syst..

[5]  Ahmed R. Abas Unsupervised learning of mixture models based on swarm intelligence and neural networks with optimal completion using incomplete data , 2012 .

[6]  Alireza Mousavi,et al.  Hybrid Mutation Particle Swarm Optimisation method for Available Transfer Capability enhancement , 2012 .

[7]  Michael I. Jordan,et al.  Supervised learning from incomplete data via an EM approach , 1993, NIPS.

[8]  Jian Li,et al.  Nonparametric spectral analysis with missing data via the EM algorithm , 2004, Conference Record of the Thirty-Eighth Asilomar Conference on Signals, Systems and Computers, 2004..

[9]  James C. Bezdek,et al.  Generalized fuzzy c-means clustering strategies using Lp norm distances , 2000, IEEE Trans. Fuzzy Syst..

[10]  Alessandro G. Di Nuovo,et al.  Missing data analysis with fuzzy C-Means: A study of its application in a psychological scenario , 2011, Expert Syst. Appl..

[11]  Tingquan Deng,et al.  An object-parameter approach to predicting unknown data in incomplete fuzzy soft sets , 2013 .

[12]  Dazhong Ma,et al.  Data-Core-Based Fuzzy Min–Max Neural Network for Pattern Classification , 2011, IEEE Transactions on Neural Networks.

[13]  Kuang Yu Huang,et al.  Author ' s personal copy A hybrid particle swarm optimization approach for clustering and classification of datasets , 2011 .

[14]  Nor Ashidi Mat Isa,et al.  Novel initialization scheme for Fuzzy C-Means algorithm on color image segmentation , 2013, Appl. Soft Comput..

[15]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[16]  Stefan Conrad,et al.  Fuzzy Clustering of Incomplete Data Based on Cluster Dispersion , 2010, IPMU.

[17]  Michel Verleysen,et al.  Feature selection with missing data using mutual information estimators , 2012, Neurocomputing.

[18]  John K. Dixon,et al.  Pattern Recognition with Partly Missing Data , 1979, IEEE Transactions on Systems, Man, and Cybernetics.

[19]  Jerzy W. Grzymala-Busse,et al.  A comparison of three closest fit approaches to missing attribute values in preterm birth data , 2002, Int. J. Intell. Syst..

[20]  Dan Li,et al.  Fuzzy c-means clustering of partially missing data sets based on statistical representation , 2010, 2010 Seventh International Conference on Fuzzy Systems and Knowledge Discovery.

[21]  Witold Pedrycz,et al.  An interval weighed fuzzy c-means clustering by genetically guided alternating optimization , 2014, Expert Syst. Appl..

[22]  Li Zhang,et al.  A hybrid clustering algorithm based on missing attribute interval estimation for incomplete data , 2014, Pattern Analysis and Applications.

[23]  Peter L. Hammer,et al.  A new imputation method for incomplete binary data , 2011, ISAIM.

[24]  James C. Bezdek,et al.  Clustering incomplete relational data using the non-Euclidean relational fuzzy c-means algorithm , 2002, Pattern Recognit. Lett..

[25]  Ming-Syan Chen,et al.  On the Design and Analysis of the Privacy-Preserving SVM Classifier , 2011, IEEE Transactions on Knowledge and Data Engineering.

[26]  Ahmed R. Abas Using general regression with local tuning for learning mixture models from incomplete data sets , 2010 .

[27]  Maurice Clerc,et al.  The particle swarm - explosion, stability, and convergence in a multidimensional complex space , 2002, IEEE Trans. Evol. Comput..

[28]  Chang-Hwan Lee,et al.  A Hellinger‐Based Importance Measure of Association Rules for Classification Learning , 2014, Int. J. Intell. Syst..

[29]  Shichao Zhang,et al.  Shell-neighbor method and its application in missing data imputation , 2011, Applied Intelligence.

[30]  Hong Gu,et al.  A fuzzy c-means clustering algorithm based on nearest-neighbor intervals for incomplete data , 2010, Expert Syst. Appl..

[31]  Qiang Fu,et al.  Delineating soil nutrient management zones based on fuzzy clustering optimized by PSO , 2010, Math. Comput. Model..

[32]  Charles Elkan,et al.  Expectation Maximization Algorithm , 2010, Encyclopedia of Machine Learning.

[33]  Ahmet Arslan,et al.  A hybrid method for imputation of missing values using optimized fuzzy c-means with support vector regression and a genetic algorithm , 2013, Inf. Sci..

[34]  Yanchi Liu,et al.  Imputing Missing Values for Mixed Numeric and Categorical Attributes Based on Incomplete Data Hierarchical Clustering , 2011, KSEM.

[35]  Vadlamani Ravi,et al.  A new online data imputation method based on general regression auto associative neural network , 2014, Neurocomputing.

[36]  John F. Kolen,et al.  Reducing the time complexity of the fuzzy c-means algorithm , 2002, IEEE Trans. Fuzzy Syst..

[37]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[38]  Ioan Cristian Trelea,et al.  The particle swarm optimization algorithm: convergence analysis and parameter selection , 2003, Inf. Process. Lett..

[39]  Riccardo Poli,et al.  Particle swarm optimization , 1995, Swarm Intelligence.

[40]  James C. Bezdek,et al.  Fuzzy c-means clustering of incomplete data , 2001, IEEE Trans. Syst. Man Cybern. Part B.

[41]  Fei Han,et al.  Improved Particle Swarm Optimization Combined with Backpropagation for Feedforward Neural Networks , 2013, Int. J. Intell. Syst..

[42]  Renata M. C. R. de Souza,et al.  A multivariate fuzzy c-means method , 2013, Appl. Soft Comput..

[43]  Chao-Ton Su,et al.  A selective Bayes classifier with meta-heuristics for incomplete data , 2013, Neurocomputing.

[44]  Edgar Acuña,et al.  The Treatment of Missing Values and its Effect on Classifier Accuracy , 2004 .

[45]  Xiaofeng Zhu,et al.  Missing data imputation by utilizing information within incomplete instances , 2011, J. Syst. Softw..