论文信息 - Novel hybrid hierarchical-K-means clustering method (H-K-means) for microarray analysis

Novel hybrid hierarchical-K-means clustering method (H-K-means) for microarray analysis

Hierarchical and k-means clustering are two major analytical tools for unsupervised microarray datasets. However, both have their innate disadvantages. Hierarchical clustering cannot represent distinct clusters with similar expression patterns. Also, as clusters grow in size, the actual expression patterns become less relevant. K-means clustering requires a specified number of clusters in advance and chooses initial centroids randomly: in addition, it is sensitive to outliers. We present a novel hybrid approach to combined merits of the two and discard disadvantages we mentioned above. It is different from existed method: carry out hierarchical clustering first to decide location and number of clusters in the first round and run the K-means clustering in another round. The brief idea is we cluster around half data through hierarchical clustering and succeed by K-means for the rest half in one single round. Also, our approach provides a mechanism to handle outliers. Comparing with existed hybrid clustering approach and K-means clustering in 2 different distance measure on Eisen's yeast microarray data, our method always generate much higher quality clusters.

[1] Chonghun Han,et al. Hybrid Clustering Method for DNA Microarray Data Analysis , 2002 .

[2] John Quackenbush,et al. Computational genetics: Computational analysis of microarray data , 2001, Nature Reviews Genetics.

[3] J. Mesirov,et al. Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[4] D. Botstein,et al. Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[5] J. MacQueen. Some methods for classification and analysis of multivariate observations , 1967 .