Observer-Biased Fuzzy Clustering

As generated by clustering algorithms, clusterings (or partitions) are hypotheses on data explanation which are better evaluated by experts from the application domain. In general, clustering algorithms allow a limited usage of domain knowledge about the cluster formation process. In this study, we propose both a design technique and a new partitioning-based clustering algorithm which can be used to assist the data analyst while looking for a set of meaningful clusters, i.e., clusters that actually correspond to the underlying data structure. Following an observer metaphor according to which the perception of a group of objects depends on the observer position-the closer an observer is from an image more details (s)he perceives-we resort to shrinkage to incorporate a regularization term, accounting for the observation point, within the objective function of an otherwise unbiased clustering algorithm. This technique allows our resulting biased algorithm to generate a set of reasonable partitions, i.e., partitions validated by a given cluster validity index, corresponding to views of data with different levels of granularity (levels of detail) in different regions of the data space. For the illustration of the design technique, we adopted the fuzzy c-means (FCM) algorithm as the unbiased clustering algorithm and include a convergence theorem assuring that changing the point of observation in the corresponding biased algorithm FCM with focal point (FCMFP) does not jeopardize its convergence. Experimental studies on both synthetic and real data are included to illustrate the usefulness of the approach. In addition, and as a convenient side effect of using shrinkage, the experimental results suggest that our biased algorithm (FCMFP) not only seems to scale better than the successive runs of the unbiased one (FCM) but on the average, seems to produce clusters exhibiting higher validity index values as well. In addition, less sensitivity to initialization was observed for the biased algorithm when compared with the unbiased one.

[1]  Marimuthu Palaniswami,et al.  Fuzzy c-Means Algorithms for Very Large Data , 2012, IEEE Transactions on Fuzzy Systems.

[2]  James M. Keller,et al.  Comparing Fuzzy, Probabilistic, and Possibilistic Partitions , 2010, IEEE Transactions on Fuzzy Systems.

[3]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[4]  C. L. Liu,et al.  Introduction to Combinatorial Mathematics. , 1971 .

[5]  David B. Hitchcock,et al.  James-Stein shrinkage to improve k-means cluster analysis , 2010, Comput. Stat. Data Anal..

[6]  Jian Yu,et al.  General C-Means Clustering Model , 2005, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Paulo Fazendeiro,et al.  A fuzzy clustering algorithm with a variable focal point , 2008, 2008 IEEE International Conference on Fuzzy Systems (IEEE World Congress on Computational Intelligence).

[8]  Catherine A. Sugar,et al.  Finding the Number of Clusters in a Dataset , 2003 .

[9]  Rajagopalan Srinivasan,et al.  NIFTI: An evolutionary approach for finding number of clusters in microarray data , 2008, BMC Bioinformatics.

[10]  Rui Xu,et al.  Survey of clustering algorithms , 2005, IEEE Transactions on Neural Networks.

[11]  Yung-Yu Chuang,et al.  Multiple Kernel Fuzzy Clustering , 2012, IEEE Transactions on Fuzzy Systems.

[12]  Doulaye Dembélé,et al.  Fuzzy C-means Method for Clustering Microarray Data , 2003, Bioinform..

[13]  Amir B. Geva,et al.  Hierarchical unsupervised fuzzy clustering , 1999, IEEE Trans. Fuzzy Syst..

[14]  Hichem Frigui,et al.  Clustering by competitive agglomeration , 1997, Pattern Recognit..

[15]  G. Trzebiatowski,et al.  On the Convergence of the Fuzzy Clustering Algorithm “Fuzzy ISODATA” , 1986 .

[16]  Xiaowei Yang,et al.  A Kernel Fuzzy c-Means Clustering-Based Fuzzy Support Vector Machine Algorithm for Classification Problems With Outliers or Noises , 2011, IEEE Transactions on Fuzzy Systems.

[17]  Miin-Shen Yang,et al.  A Robust Automatic Merging Possibilistic Clustering Method , 2011, IEEE Transactions on Fuzzy Systems.

[18]  Dimitar Filev,et al.  Generation of Fuzzy Rules by Mountain Clustering , 1994, J. Intell. Fuzzy Syst..

[19]  Daniel T. Larose,et al.  Discovering Knowledge in Data: An Introduction to Data Mining , 2005 .

[20]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[21]  Mokhtar S. Bazaraa,et al.  Nonlinear Programming: Theory and Algorithms , 1993 .

[22]  Hui Xiong,et al.  A Generalization of Distance Functions for Fuzzy $c$ -Means Clustering With Centroids of Arithmetic Means , 2012, IEEE Transactions on Fuzzy Systems.

[23]  Mohamed S. Kamel,et al.  New algorithms for solving the fuzzy clustering problem , 1994, Pattern Recognit..

[24]  Witold Pedrycz,et al.  Knowledge-based clustering - from data to information granules , 2007 .

[25]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[26]  Carl G. Looney,et al.  Interactive clustering and merging with a new fuzzy expected value , 2002, Pattern Recognit..

[27]  Richard J. Hathaway,et al.  Density-Weighted Fuzzy c-Means Clustering , 2009, IEEE Transactions on Fuzzy Systems.

[28]  Aize Cao,et al.  A New Cluster Validity for Data Clustering , 2006, Neural Processing Letters.

[29]  Hidetomo Ichihashi,et al.  Fuzzy PCA-Guided Robust $k$-Means Clustering , 2010, IEEE Transactions on Fuzzy Systems.

[30]  J. Suykens,et al.  Convex Clustering Shrinkage , 2005 .

[31]  Witold Pedrycz,et al.  Fuzzy Clustering With Viewpoints , 2010, IEEE Transactions on Fuzzy Systems.

[32]  Marek Reformat,et al.  Hierarchical FCM in a stepwise discovery of structure in data , 2006, Soft Comput..

[33]  Marvin H. J. Gruber Improving Efficiency by Shrinkage: The James--Stein and Ridge Regression Estimators , 1998 .

[34]  Vladimir Estivill-Castro,et al.  Why so many clustering algorithms: a position paper , 2002, SKDD.

[35]  Isak Gath,et al.  Unsupervised Optimal Fuzzy Clustering , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[36]  Raghu Krishnapuram,et al.  Fitting an unknown number of lines and planes to image data through compatible cluster merging , 1992, Pattern Recognit..

[37]  Nikola Kasabov,et al.  Fuzzy clustering of gene expression data , 2002, 2002 IEEE World Congress on Computational Intelligence. 2002 IEEE International Conference on Fuzzy Systems. FUZZ-IEEE'02. Proceedings (Cat. No.02CH37291).

[38]  Michael K. Ng,et al.  Agglomerative Fuzzy K-Means Clustering Algorithm with Selection of Number of Clusters , 2008, IEEE Transactions on Knowledge and Data Engineering.

[39]  Arnaud Devillez,et al.  A fuzzy hybrid hierarchical clustering method with a new criterion able to find the optimal partition , 2002, Fuzzy Sets Syst..

[40]  Didier Dubois,et al.  Fuzzy sets-a convenient fiction for modeling vagueness and possibility , 1994, IEEE Trans. Fuzzy Syst..

[41]  U. Kaymak,et al.  Compatible cluster merging for fuzzy modelling , 1995, Proceedings of 1995 IEEE International Conference on Fuzzy Systems..

[42]  James C. Bezdek,et al.  On cluster validity for the fuzzy c-means model , 1995, IEEE Trans. Fuzzy Syst..

[43]  Marina Meila,et al.  Data centering in feature space , 2003, AISTATS.