Reliable Hierarchical Clustering with the Self-organizing Map

Clustering problems arise in various domains of science and engineering. A large number of methods have been developed to date. Kohonen self-organizing map (SOM) is a popular tool that maps a high-dimensional space onto a small number of dimensions by placing similar elements close together, forming clusters. Cluster analysis is often left to the user. In this paper we present a method and a set of tools to perform unsupervised SOM cluster analysis, determine cluster confidence and visualize the result as a tree facilitating comparison with existing hierarchical classifiers. We also introduce a distance measure for cluster trees that allows to select a SOM with the most confident clusters.

[1]  Bernd Fritzke,et al.  A Growing Neural Gas Network Learns Topologies , 1994, NIPS.

[2]  Andreas Rauber,et al.  Uncovering hierarchical structure in data using the growing hierarchical self-organizing map , 2002, Neurocomputing.

[3]  William R. Taylor,et al.  The rapid generation of mutation data matrices from protein sequences , 1992, Comput. Appl. Biosci..

[4]  Kimmen Sjölander,et al.  Phylogenomic inference of protein molecular function: advances and challenges , 2004, Bioinform..

[5]  Gert Vriend,et al.  GPCRDB information system for G protein-coupled receptors , 2003, Nucleic Acids Res..

[6]  Panu Somervuo,et al.  How to make large self-organizing maps for nonvectorial data , 2002, Neural Networks.

[7]  Johan Himberg,et al.  A SOM based cluster visualization and its application for false coloring , 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium.

[8]  J. Felsenstein CONFIDENCE LIMITS ON PHYLOGENIES: AN APPROACH USING THE BOOTSTRAP , 1985, Evolution; international journal of organic evolution.

[9]  Anil K. Jain,et al.  A nonlinear projection method based on Kohonen's topology preserving maps , 1992, IEEE Trans. Neural Networks.

[10]  Igor Fischer,et al.  Similarity-Based Neural Networks for Applications in Computational Molecular Biology , 2003, IDA.

[11]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[12]  Thomas Bäck,et al.  Combining and Comparing Cluster Methods in a Receptor Database , 2003, IDA.

[13]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[14]  Esa Alhoniemi,et al.  Clustering of the self-organizing map , 2000, IEEE Trans. Neural Networks Learn. Syst..

[15]  J. Dopazo,et al.  Phylogenetic Reconstruction Using an Unsupervised Growing Neural Network That Adopts the Topology of a Phylogenetic Tree , 1997, Journal of Molecular Evolution.

[16]  Eugene W. Myers,et al.  Optimal alignments in linear space , 1988, Comput. Appl. Biosci..

[17]  E. N. Adams,et al.  N-trees as nestings: Complexity, similarity, and consensus , 1986 .

[18]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[19]  Jorma Laaksonen,et al.  SOM_PAK: The Self-Organizing Map Program Package , 1996 .

[20]  Paul R. Cohen,et al.  Very Predictive Ngrams for Space-Limited Probabilistic Models , 2003, IDA.

[21]  Fred R. McMorris,et al.  Consensusn-trees , 1981 .

[22]  Joost N. Kok,et al.  TreeSOM: Cluster analysis in the self-organizing map , 2006, Neural Networks.

[23]  Teuvo Kohonen,et al.  Self-Organizing Maps , 2010 .

[24]  Jens G. Reich,et al.  Kohonen map as a visualization tool for the analysis of protein sequences: multiple alignments, domains and segments of secondary structures , 1996, Comput. Appl. Biosci..

[25]  M S Waterman Parametric and ensemble sequence alignment algorithms. , 1994, Bulletin of mathematical biology.