New fuzzy c-means clustering model based on the data weighted approach

This paper proposes a new kind of data weighted fuzzy c-means clustering approach. Different from most existing fuzzy clustering approaches, the data weighted clustering approach considers the internal connectivity of all data points. An exponent impact factors vector and an influence exponent are introduced to the new model. Together they influence the clustering process. The data weighted clustering can simultaneously produce three categories of parameters: fuzzy membership degrees, exponent impact factors and the cluster prototypes. A new fuzzy algorithm, DWG-K, is developed by combining the data weighted approach and the G-K. Two groups of numerical experiments were executed. Group 1 demonstrates the clustering performance of the DWG-K. The counterpart is the G-K. The results show the DWG-K can obtain better clustering quality and meanwhile it holds the same level of computational efficiency as the G-K holds. Group 2 checks the ability of the DWG-K in mining the outliers. The counterpart is the well-known LOF. The results show the DWG-K has considerable advantage over the LOF in computational efficiency. And the outliers mined by the DWG-K are global. It was pointed out that the data weighted clustering approach has its unique advantages when mining the outliers of the large scale data sets, when clustering the data set for better clustering results, and especially when these two tasks are done simultaneously.

[1]  James M. Keller,et al.  A possibilistic fuzzy c-means clustering algorithm , 2005, IEEE Transactions on Fuzzy Systems.

[2]  Srinivasan Parthasarathy,et al.  Fast Distributed Outlier Detection in Mixed-Attribute Data Sets , 2006, Data Mining and Knowledge Discovery.

[3]  James C. Bezdek,et al.  On cluster validity for the fuzzy c-means model , 1995, IEEE Trans. Fuzzy Syst..

[4]  Junyi Shen,et al.  Detecting outlier samples in multivariate time series dataset , 2008, Knowl. Based Syst..

[5]  Chun Zhang,et al.  Storing and querying ordered XML using a relational database system , 2002, SIGMOD '02.

[6]  Sadaaki Miyamoto,et al.  On the Use of Variable-Size Fuzzy Clustering for Classification , 2006, MDAI.

[7]  Gerardo Beni,et al.  A Validity Measure for Fuzzy Clustering , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Feature weighting and feature selection in fuzzy clustering , 2008, 2008 IEEE International Conference on Fuzzy Systems (IEEE World Congress on Computational Intelligence).

[9]  Chieh-Yuan Tsai,et al.  Developing a feature weight self-adjustment mechanism for a K-means clustering algorithm , 2008, Comput. Stat. Data Anal..

[10]  Sanya Mitaim,et al.  Effects of Weights in Weighted Fuzzy C-Means Algorithm for Room Equalization at Multiple Locations , 2006, 2006 IEEE International Conference on Fuzzy Systems.

[11]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[12]  Jian Pei,et al.  Data Mining: Concepts and Techniques, 3rd edition , 2006 .

[13]  Rajesh N. Davé,et al.  Characterization and detection of noise in clustering , 1991, Pattern Recognit. Lett..

[14]  Ming-Syan Chen,et al.  Dual Clustering: Integrating Data Clustering over Optimization and Constraint Domains , 2005, IEEE Trans. Knowl. Data Eng..

[15]  Michael K. Ng,et al.  An optimization algorithm for clustering using weighted dissimilarity measures , 2004, Pattern Recognit..

[16]  James C. Bezdek,et al.  A mixed c-means clustering model , 1997, Proceedings of 6th International Fuzzy Systems Conference.

[17]  James M. Keller,et al.  A possibilistic approach to clustering , 1993, IEEE Trans. Fuzzy Syst..

[18]  François Fouss,et al.  Graph nodes clustering with the sigmoid commute-time kernel: A comparative study , 2009, Data Knowl. Eng..

[19]  Jacek M. Leski Generalized weighted conditional fuzzy clustering , 2003, IEEE Trans. Fuzzy Syst..

[20]  Yadong Wang,et al.  Improving fuzzy c-means clustering based on feature-weight learning , 2004, Pattern Recognit. Lett..

[21]  Giulia Bruno,et al.  TOD: Temporal outlier detection by using quasi-functional temporal dependencies , 2010, Data Knowl. Eng..

[22]  Mauro Barni,et al.  Comments on "A possibilistic approach to clustering" , 1996, IEEE Trans. Fuzzy Syst..

[23]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD 2000.

[24]  Ayellet Tal,et al.  Hierarchical mesh decomposition using fuzzy clustering and cuts , 2003, ACM Trans. Graph..

[25]  Miin-Shen Yang,et al.  A cluster validity index for fuzzy clustering , 2005, Pattern Recognit. Lett..

[26]  Donald Gustafson,et al.  Fuzzy clustering with a fuzzy covariance matrix , 1978, 1978 IEEE Conference on Decision and Control including the 17th Symposium on Adaptive Processes.

[27]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[28]  Srinivasan Parthasarathy,et al.  Fast mining of distance-based outliers in high-dimensional datasets , 2008, Data Mining and Knowledge Discovery.