A Comparative Study of Clustering Methods for Molecular Data

The research aim is to use three clustering technologies for establishing molecular data model of large size sets by comparison between low energy samples (LES) and local molecular samples (LMS). Hierarchical cluster of multi-level tree distance relation, competitive learning network of similar inputs falling into the same cluster and topological SOM are used to analyze 6,242 LES and 5,000 LMS. Our experiments show that in SOM, there are 24 to 25 Davies-Boulding clustering index and color map cluster units in the LES more than 10 to 12 in the LMS, which is consistent with the results of hierarchical cluster and competitive learning network in the rough. The hierarchical cluster reflects the biggest inter-cluster distance about 30 for the LES is far larger than that of LMS about 10. The intra-cluster distance of LES about 15 is also far bigger than that of LMS about 3. In SOM, there are more cluster borders of high values (black) reflecting large distance and more clusters in the D-matrix and U-matrix of LES than that of LMS, due to the biggest standard deviation range from -8 to 10 of samples feature of the LES is bigger than that of LMS from -2.5 to 2.5.

[1]  Ron Shamir,et al.  Clustering Gene Expression Patterns , 1999, J. Comput. Biol..

[2]  Teuvo Kohonen,et al.  The self-organizing map , 1990 .

[3]  Hidetoshi Shimodaira,et al.  Pvclust: an R package for assessing the uncertainty in hierarchical clustering , 2006, Bioinform..

[4]  Ali Mansour,et al.  Separation of sources using simulated annealing and competitive learning , 2002, Neurocomputing.

[5]  Edward R. Dougherty,et al.  A probabilistic theory of clustering , 2004, Pattern Recognit..

[6]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Yinghua Lu,et al.  A Feature Selection Methods Based on Concept Extraction and SOM Text Clustering Analysis , 2006 .

[8]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  John A. Hartigan,et al.  Clustering Algorithms , 1975 .

[10]  Bo Zhang,et al.  Self-organizing Map Analysis Consistent with Neuroimaging for Chinese Noun, Verb and Class-Ambiguous Word , 2005, ISNN.

[11]  Helge J. Ritter,et al.  Large-scale data exploration with the hierarchically growing hyperbolic SOM , 2006, Neural Networks.

[12]  Miguel Figueroa,et al.  Competitive learning with floating-gate circuits , 2002, IEEE Trans. Neural Networks.

[13]  Vipin Kumar,et al.  Chameleon: Hierarchical Clustering Using Dynamic Modeling , 1999, Computer.

[14]  William Bialek,et al.  How Many Clusters? An Information-Theoretic Perspective , 2003, Neural Computation.

[15]  Yinghua Lu,et al.  Self-Organizing Map Clustering Analysis for Molecular Data , 2006, ISNN.

[16]  Jun Wang A linear assignment clustering algorithm based on the least similar cluster representatives , 1999, IEEE Trans. Syst. Man Cybern. Part A.

[17]  Yinghua Lu,et al.  Clustering Analysis of Competitive Learning Network for Molecular Data , 2006, ISNN.

[18]  Ka Yee Yeung,et al.  Principal component analysis for clustering gene expression data , 2001, Bioinform..

[19]  Teresa M. Przytycka,et al.  COCO-CL: hierarchical clustering of homology relations based on evolutionary correlations , 2006, Bioinform..