论文信息 - Clustering breast cancer data by consensus of different validity indices

Clustering breast cancer data by consensus of different validity indices

Clustering algorithms will, in general, either partition a given data set into a pre-specified number of clusters or will produce a hierarchy of clusters. In this paper we analyse several different clustering techniques and apply them to a particular data set of breast cancer data. When we do not know a priori which is the best number of groups, we use a range of different validity indices to test the quality of clustering results and to determine the best number of clusters. While for the K-means method there is not absolute agreement among the indices as to which is the best number of clusters, for the PAM algorithm all the indices indicate 4 as the best cluster number.

[1] M. Lefebvre. Applied probability and statistics , 2006 .

[2] J. Edward Jackson,et al. A User's Guide to Principal Components. , 1991 .

[3] Ali S. Hadi,et al. Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[4] Stuart A. Roberts,et al. New methods for the initialisation of clusters , 1996, Pattern Recognit. Lett..

[5] Isak Gath,et al. Unsupervised Optimal Fuzzy Clustering , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[6] Gerardo Beni,et al. A Validity Measure for Fuzzy Clustering , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[7] S. Dolnicar,et al. An examination of indexes for determining the number of clusters in binary data sets , 2002, Psychometrika.

[8] J. Bezdek. Cluster Validity with Fuzzy Sets , 1973 .

[9] G. Ball,et al. High‐throughput protein expression analysis using tissue microarray technology of a large well‐characterised series identifies biologically distinct classes of breast cancer confirming recent cDNA expression analyses , 2005, International journal of cancer.

[10] P. Rousseeuw,et al. Wiley Series in Probability and Mathematical Statistics , 2005 .

[11] Robert A. Greevy. Data Analysis and Graphics Using R: An Example-Based Approach , 2010 .