Self Organized Swarms for cluster preserving Projections of high-dimensional Data

A new approach for topographic mapping, called Swarm-Organized Projection (SOP) is presented. SOP has been inspired by swarm intelligence methods for clustering and is similar to Curvilinear Component Analysis (CCA) and SOM. In contrast to the latter the choice of critical parameters is substituted by self-organization. On several crucial benchmark data sets it is demonstrated that SOP outperforms many other projection methods. SOP produces coherent clusters even for complex entangled high dimensional cluster structures. For a nontrivial dataset on protein DNA sequence Multi Dimensional Scaling (MDS) and CCA fail to represent the clusters in the data, although the clusters are clearly defined. With SOP the correct clusters in the data could be easily detected.

[1]  Wendy R. Fox,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1991 .

[2]  A. Kirman,et al.  A physical analogue of the Schelling model , 2006, Proceedings of the National Academy of Sciences.

[3]  M. V. Velzen,et al.  Self-organizing maps , 2007 .

[4]  Alfred Ultsch,et al.  Emergence in Self Organizing Feature Maps , 2007 .

[5]  L. Herrmann Swarm-Organized Projection for Topographic Mapping , 2009 .

[6]  A. Ultsch Maps for the Visualization of high-dimensional Data Spaces , 2003 .

[7]  M. Lefebvre Applied probability and statistics , 2006 .

[8]  James M. Keller,et al.  Fuzzy Measures on the Gene Ontology for Gene Product Similarity , 2006, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[9]  Jeanny Hérault,et al.  Curvilinear component analysis: a self-organizing neural network for nonlinear mapping of data sets , 1997, IEEE Trans. Neural Networks.

[10]  Barbara Webb,et al.  Swarm Intelligence: From Natural to Artificial Systems , 2002, Connect. Sci..

[11]  Thomas Villmann,et al.  Neural maps and topographic vector quantization , 1999, Neural Networks.

[12]  Alan Julian Izenman,et al.  Modern Multivariate Statistical Techniques , 2008 .

[13]  T. Schelling Models of Segregation , 1969 .

[14]  David G. Stork,et al.  Pattern Classification , 1973 .

[15]  Jarkko Venna,et al.  The self-organizing map as a visual neighbor retrieval method , 2007 .

[16]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[17]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.