Essentials of the self-organizing map

The self-organizing map (SOM) is an automatic data-analysis method. It is widely applied to clustering problems and data exploration in industry, finance, natural sciences, and linguistics. The most extensive applications, exemplified in this paper, can be found in the management of massive textual databases and in bioinformatics. The SOM is related to the classical vector quantization (VQ), which is used extensively in digital signal processing and transmission. Like in VQ, the SOM represents a distribution of input data items using a finite set of models. In the SOM, however, these models are automatically associated with the nodes of a regular (usually two-dimensional) grid in an orderly fashion such that more similar models become automatically associated with nodes that are adjacent in the grid, whereas less similar models are situated farther away from each other in the grid. This organization, a kind of similarity diagram of the models, makes it possible to obtain an insight into the topographic relationships of data, especially of high-dimensional data items. If the data items belong to certain predetermined classes, the models (and the nodes) can be calibrated according to these classes. An unknown input item is then classified according to that node, the model of which is most similar with it in some metric used in the construction of the SOM. A new finding introduced in this paper is that an input item can even more accurately be represented by a linear mixture of a few best-matching models. This becomes possible by a least-squares fitting procedure where the coefficients in the linear mixture of models are constrained to nonnegative values.

[1]  R. Gray,et al.  Vector quantization , 1984, IEEE ASSP Magazine.

[2]  E. Forgy,et al.  Cluster analysis of multivariate data : efficiency versus interpretability of classifications , 1965 .

[3]  P. Sopp Cluster analysis. , 1996, Veterinary immunology and immunopathology.

[4]  H. Robbins A Stochastic Approximation Method , 1951 .

[5]  Samuel Kaski,et al.  Self-organizing map-based discovery and visualization of human endogenous retroviral sequence groups , 2005, Int. J. Neural Syst..

[6]  Terrence J. Sejnowski,et al.  Self-Organizing Map Formation: Foundations of Neural Computation , 2001 .

[7]  Gary Marchionini,et al.  A self-organizing semantic map for information retrieval , 1991, SIGIR '91.

[8]  S. Zeki The representation of colours in the cerebral cortex , 1980, Nature.

[9]  N. Suga,et al.  Neural axis representing target range in the auditory cortex of the mustache bat. , 1979, Science.

[10]  Timo Honkela,et al.  WEBSOM - Self-organizing maps of document collections , 1998, Neurocomputing.

[11]  Allen Gersho,et al.  On the structure of vector quantizers , 1982, IEEE Trans. Inf. Theory.

[12]  C. Lawson,et al.  Solving least squares problems , 1976, Classics in applied mathematics.

[13]  Lakhmi C. Jain,et al.  Self-Organizing neural networks: recent advances and applications , 2001 .

[14]  D. Hubel,et al.  Receptive fields, binocular interaction and functional architecture in the cat's visual cortex , 1962, The Journal of physiology.

[15]  C. Malsburg Self-organization of orientation sensitive cells in the striate cortex , 2004, Kybernetik.

[16]  Risto Miikkulainen,et al.  Computational Maps in the Visual Cortex , 2005 .

[17]  H. P. Annales de l'Institut Henri Poincaré , 1931, Nature.

[18]  John W. Sammon,et al.  A Nonlinear Mapping for Data Structure Analysis , 1969, IEEE Transactions on Computers.

[19]  Teuvo Kohonen,et al.  Self-Organization and Associative Memory , 1988 .

[20]  Teuvo Kohonen,et al.  Data Management by Self-Organizing Maps , 2008, WCCI.

[21]  Alessio Micheli,et al.  Recursive self-organizing network models , 2004, Neural Networks.

[22]  Samuel Kaski,et al.  Grouping and visualizing human endogenous retroviruses by bootstrapping median self-organizing maps , 2004, 2004 Symposium on Computational Intelligence in Bioinformatics and Computational Biology.

[23]  Dieter Merkl CONTENT-BASED DOCUMENT CLASSIFICATION WITH HIGHLY COMPRESSED INPUT DATA , 1995 .

[24]  Christopher M. Bishop,et al.  GTM: The Generative Topographic Mapping , 1998, Neural Computation.

[25]  Samuel Kaski,et al.  Clustering of Human Endogenous Retrovirus Sequences with Median Self-Organizing Map , 2003 .

[26]  M.Kleinberg Jon,et al.  Advances in Self-Organizing Maps, 7th International Workshop, WSOM 2009, St. Augustine, FL, USA, June 8-10, 2009. Proceedings , 2009, WSOM.

[27]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[28]  Helge J. Ritter,et al.  Neural computation and self-organizing maps - an introduction , 1992, Computation and neural systems series.

[29]  Risto Miikkulainen,et al.  Subsymbolic natural language processing - an integrated model of scripts, lexicon, and memory , 1993, Neural network modeling and connectionism.

[30]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[31]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[32]  Samuel Kaski,et al.  Bibliography of Self-Organizing Map (SOM) Papers: 1981-1997 , 1998 .

[33]  A. Tunturi A difference in the representation of auditory signals for the left and right ears in the iso-frequency contours of the right middle ectosylvian auditory cortex of the dog. , 1952, The American journal of physiology.

[34]  V. Mountcastle Modality and topographic properties of single neurons of cat's somatic sensory cortex. , 1957, Journal of neurophysiology.

[35]  Teuvo Kohonen,et al.  Median strings , 1985, Pattern Recognit. Lett..

[36]  Bernd Fritzke,et al.  Growing cell structures--A self-organizing network for unsupervised and supervised learning , 1994, Neural Networks.

[37]  Roman Bek,et al.  Discourse on one way in which a quantum-mechanics language on the classical logical base can be built up , 1978, Kybernetika.

[38]  J. C. Scholtes Unsupervised context learning in natural language processing , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[39]  Yiming Yang,et al.  RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..

[40]  Erkki Oja,et al.  Engineering applications of the self-organizing map , 1996, Proc. IEEE.

[41]  Charles L. Lawson,et al.  Solving least squares problems , 1976, Classics in applied mathematics.

[42]  Allen Gersho,et al.  Asymptotically optimal block quantization , 1979, IEEE Trans. Inf. Theory.

[43]  S. Grossberg On the development of feature detectors in the visual cortex with applications to learning and reaction-diffusion systems , 1976, Biological Cybernetics.

[44]  Teuvo Kohonen,et al.  The self-organizing map , 1990, Neurocomputing.

[45]  D. J. Felleman,et al.  Topographic reorganization of somatosensory cortical areas 3b and 1 in adult monkeys following restricted deafferentation , 1983, Neuroscience.

[46]  Teuvo Kohonen,et al.  Self-organization and associative memory: 3rd edition , 1989 .

[47]  Panu Somervuo,et al.  How to make large self-organizing maps for nonvectorial data , 2002, Neural Networks.

[48]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[49]  Samuel Kaski,et al.  Keyword selection method for characterizing text document maps , 1999 .

[50]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[51]  M. Cottrell,et al.  Etude d'un processus d'auto-organisation , 1987 .

[52]  T. Kohonen,et al.  Visual Explorations in Finance with Self-Organizing Maps , 1998 .

[53]  Samuel Kaski,et al.  Self-Organized Formation of Various Invariant-Feature Filters in the Adaptive-Subspace SOM , 1997, Neural Computation.

[54]  Teuvo Kohonen,et al.  Emergence of invariant-feature detectors in the adaptive-subspace self-organizing map , 1996, Biological Cybernetics.

[55]  Samuel Kaski,et al.  Self organization of a massive document collection , 2000, IEEE Trans. Neural Networks Learn. Syst..

[56]  A. Naim,et al.  Galaxy Morphology without Classification: Self-organizing Maps , 1997 .

[57]  Teuvo Kohonen,et al.  Contextually Self-Organized Maps of Chinese Words , 2011, WSOM.

[58]  T. Kohonen Self-organized formation of topographically correct feature maps , 1982 .

[59]  Jouko Lampinen,et al.  Self-Organizing Maps for Spatial and Temporal AR Models , 1989 .

[60]  Esa Alhoniemi,et al.  Self-organizing map in Matlab: the SOM Toolbox , 1999 .

[61]  Esa Alhoniemi,et al.  Self-Organizing Map for Data Mining in MATLAB: The SOM Toolbox , 1999 .

[62]  T. Kohonen,et al.  Bibliography of Self-Organizing Map SOM) Papers: 1998-2001 Addendum , 2003 .

[63]  Barbara Hammer,et al.  5th Workshop on Self-Organizing Maps : Paris 1 Panthéon-Sorbonne University 5th-8th September 2005 : proceedings , 2005 .

[64]  Teuvo Kohonen,et al.  Self-organized formation of topologically correct feature maps , 2004, Biological Cybernetics.

[65]  L. Cooper,et al.  A theory for the development of feature detecting cells in visual cortex , 1975, Biological Cybernetics.

[66]  Teuvo Kohonen,et al.  Self-organizing neural projections , 2006, Neural Networks.

[67]  Erkki Oja,et al.  Kohonen Maps , 1999, Encyclopedia of Machine Learning.

[68]  D. Lipman,et al.  Improved tools for biological sequence comparison. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[69]  Fabio Cocurullo,et al.  A new algorithm for vector quantization , 1995, Proceedings DCC '95 Data Compression Conference.

[70]  Michael R. Anderberg,et al.  Cluster Analysis for Applications , 1973 .

[71]  Robert M. Gray,et al.  An Algorithm for Vector Quantizer Design , 1980, IEEE Trans. Commun..

[72]  Vincent Kanade,et al.  Clustering Algorithms , 2021, Wireless RF Energy Transfer in the Massive IoT Era.

[73]  Gilles Pagès,et al.  Theoretical aspects of the SOM algorithm , 1998, Neurocomputing.

[74]  Yizong Cheng Convergence and Ordering of Kohonen's Batch Map , 1997, Neural Computation.

[75]  J. Makhoul,et al.  Vector quantization in speech coding , 1985, Proceedings of the IEEE.

[76]  L. Rabiner,et al.  The acoustics, speech, and signal processing society - A historical perspective , 1984, IEEE ASSP Magazine.

[77]  S. Amari Topographic organization of nerve fields , 1979, Neuroscience Letters.

[78]  T. Kohonen,et al.  Self-organizing semantic maps , 1989, Biological Cybernetics.

[79]  M. V. Velzen,et al.  Self-organizing maps , 2007 .

[80]  Gerti Kappel,et al.  A Self-Organizing Map that Learns the Semantic Similarity of Reusable Software Components , 1994 .

[81]  Georges Voronoi Nouvelles applications des paramètres continus à la théorie des formes quadratiques. Deuxième mémoire. Recherches sur les parallélloèdres primitifs. , 1908 .

[82]  Marc M. Van Hulle,et al.  Faithful Representations and Topographic Maps: From Distortion- to Information-Based Self-Organization , 2000 .

[83]  R. Pérez,et al.  Development of Specificity in the Cat Visual Cortex , 1975, Journal of mathematical biology.

[84]  D. H. Hubel,et al.  RECEPTIVE FIELDS, BINOCULAR AND FUNCTIONAL ARCHITECTURE IN THE CAT’S VISUAL CORTEX , 1962 .

[85]  G. L. Dirichlet Über die Reduction der positiven quadratischen Formen mit drei unbestimmten ganzen Zahlen. , 1850 .

[86]  A. Tunturi Physiological determination of the arrangement of the afferent connections to the middle ectosylvian auditory area in the dog. , 1950, The American journal of physiology.