Autonomous and deterministic supervised fuzzy clustering with data imputation capabilities

A fuzzy model based on enhanced supervised fuzzy clustering algorithm is presented in this paper. Supervised fuzzy clustering algorithm by Janos Abonyi and Ferenc Szeifert in the year 2003 allows each rule to represent more than one output with different probabilities for each output. This algorithm implements k-means to initialize the fuzzy model. The main drawbacks of this approach are the number of clusters is unknown and the initial positions of clusters are randomly generated. In this work, the initialization is performed by global k-means algorithm [1] which can autonomously determine the actual number of clusters needed and give deterministic clustering result. In addition, fast global k-means [1] is presented to improve the computation time. Besides that, when collecting input data in a feature vector way, it might occur that some of the feature values are lost for a particular vector due to a faulty reading sensor. To deal with missing values in enhanced supervised fuzzy clustering, the efficient way is imputation during data preprocessing. The modified of optimal completion strategy is presented to solve this problem. This method allows imputation of missing data with high reliability and accuracy. The autonomous and deterministic enhanced supervised fuzzy clustering using supervised Gath-Geva clustering method and the modified of optimal completion strategy can be derived from the unsupervised Gath-Geva algorithm. The proposed algorithm is successfully justified based on benchmark data sets and a real vibration data which was collected from U.S. Navy CH-46E helicopter aft gearbox called Westland.

[1]  Pedro Larrañaga,et al.  An empirical comparison of four initialization methods for the K-Means algorithm , 1999, Pattern Recognit. Lett..

[2]  Nikos A. Vlassis,et al.  The global k-means clustering algorithm , 2003, Pattern Recognit..

[3]  Sankar K. Pal,et al.  Fuzzy models for pattern recognition : methods that search for structures in data , 1992 .

[4]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[5]  Ronald R. Coifman,et al.  Entropy-based algorithms for best basis selection , 1992, IEEE Trans. Inf. Theory.

[6]  Witold Pedrycz,et al.  Data Mining Methods for Knowledge Discovery , 1998, IEEE Trans. Neural Networks.

[7]  R. Elashoff,et al.  Missing Observations in Multivariate Statistics I. Review of the Literature , 1966 .

[8]  Donald Gustafson,et al.  Fuzzy clustering with a fuzzy covariance matrix , 1978, 1978 IEEE Conference on Decision and Control including the 17th Symposium on Adaptive Processes.

[9]  R. R. Hocking,et al.  The analysis of incomplete data. , 1971 .

[10]  Brian D. Ripley,et al.  Pattern Recognition and Neural Networks , 1996 .

[11]  Isak Gath,et al.  Unsupervised Optimal Fuzzy Clustering , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  H. Hartley Maximum Likelihood Estimation from Incomplete Data , 1958 .

[13]  M. Victor Wickerhauser,et al.  Adapted wavelet analysis from theory to software , 1994 .

[14]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[15]  L. Jain,et al.  Fuzzy sets and their application to clustering and training , 2000 .

[16]  Roderick J. A. Little,et al.  Statistical Analysis with Missing Data , 1988 .

[17]  Gary G. Yen,et al.  Wavelet packet feature extraction for vibration monitoring , 2000, IEEE Trans. Ind. Electron..

[18]  Ferenc Szeifert,et al.  Supervised fuzzy clustering for the identification of fuzzy classifiers , 2003, Pattern Recognit. Lett..

[19]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .