Data Management by Self-Organizing Maps

The self-organizing map (SOM) is an automatic data-analysis method. It is widely applied to clustering problems and data exploration in industry, finance, natural sciences, and linguistics. The most extensive applications, exemplified in this paper, can be found in the management of massive textual data bases. The SOM is related to the classical vector quantization (VQ), which is used extensively in digital signal processing and transmission. Like in VQ, the SOM represents a distribution of input data items using a finite set of models. In the SOM, however, these models are automatically associated with the nodes of a regular (usually two-dimensional) grid in an ordered fashion such that more similar models become automatically associated with nodes that are adjacent in the grid, whereas less similar models are situated farther away from each other in the grid. This organization, a kind of similarity diagram of the models, makes it possible to obtain an insight into the topographic relationships of data, especially of high-dimensional data items. If the data items belong to certain predetermined classes, the models (and the nodes) can be calibrated according to these classes. An unknown input item is then classified according to that node, the model of which is most similar with it in some metric used in the construction of the SOM. A new finding introduced in this paper is that an input item can even more accurately be represented by a linear mixture of a few best-matching models. This becomes possible by a least-squares fitting procedure where the coefficients in the linear mixture of models are constrained to nonnegative values.

[1]  T. Kohonen,et al.  Self-organizing semantic maps , 1989, Biological Cybernetics.

[2]  Charles L. Lawson,et al.  Solving least squares problems , 1976, Classics in applied mathematics.

[3]  Allen Gersho,et al.  Asymptotically optimal block quantization , 1979, IEEE Trans. Inf. Theory.

[4]  Esa Alhoniemi,et al.  Self-organizing map in Matlab: the SOM Toolbox , 1999 .

[5]  Samuel Kaski,et al.  Self organization of a massive document collection , 2000, IEEE Trans. Neural Networks Learn. Syst..

[6]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[7]  Marc M. Van Hulle,et al.  Faithful Representations and Topographic Maps: From Distortion- to Information-Based Self-Organization , 2000 .

[8]  G. L. Dirichlet Über die Reduction der positiven quadratischen Formen mit drei unbestimmten ganzen Zahlen. , 1850 .

[9]  M.Kleinberg Jon,et al.  Advances in Self-Organizing Maps, 7th International Workshop, WSOM 2009, St. Augustine, FL, USA, June 8-10, 2009. Proceedings , 2009, WSOM.

[10]  Allen Gersho,et al.  On the structure of vector quantizers , 1982, IEEE Trans. Inf. Theory.

[11]  Christopher M. Bishop,et al.  GTM: The Generative Topographic Mapping , 1998, Neural Computation.

[12]  A. Ultsch,et al.  Self-Organizing Neural Networks for Visualisation and Classification , 1993 .

[13]  Teuvo Kohonen,et al.  Visual Explorations in Finance , 1998 .

[14]  Alessio Micheli,et al.  Recursive self-organizing network models , 2004, Neural Networks.

[15]  Michael R. Anderberg,et al.  Cluster Analysis for Applications , 1973 .

[16]  Erkki Oja,et al.  PicSOM-self-organizing image retrieval with MPEG-7 content descriptors , 2002, IEEE Trans. Neural Networks.

[17]  Risto Miikkulainen,et al.  Computational Maps in the Visual Cortex , 2005 .

[18]  Gilles Pagès,et al.  Theoretical aspects of the SOM algorithm , 1998, Neurocomputing.

[19]  Yizong Cheng Convergence and Ordering of Kohonen's Batch Map , 1997, Neural Computation.

[20]  J. Makhoul,et al.  Vector quantization in speech coding , 1985, Proceedings of the IEEE.

[21]  Erkki Oja,et al.  Engineering applications of the self-organizing map , 1996, Proc. IEEE.

[22]  M. Cottrell,et al.  Etude d'un processus d'auto-organisation , 1987 .

[23]  Georges Voronoi Nouvelles applications des paramètres continus à la théorie des formes quadratiques. Premier mémoire. Sur quelques propriétés des formes quadratiques positives parfaites. , 1908 .

[24]  Teuvo Kohonen,et al.  Self-organization and associative memory: 3rd edition , 1989 .

[25]  Robert M. Gray,et al.  An Algorithm for Vector Quantizer Design , 1980, IEEE Trans. Commun..

[26]  Teuvo Kohonen,et al.  The self-organizing map , 1990 .

[27]  Christopher M. Bishop,et al.  Developments of the generative topographic mapping , 1998, Neurocomputing.

[28]  Panu Somervuo,et al.  How to make large self-organizing maps for nonvectorial data , 2002, Neural Networks.

[29]  Bernd Fritzke,et al.  Growing cell structures--A self-organizing network for unsupervised and supervised learning , 1994, Neural Networks.

[30]  Lakhmi C. Jain,et al.  Self-Organizing neural networks: recent advances and applications , 2001 .

[31]  Samuel Kaski,et al.  Mining massive document collections by the WEBSOM method , 2004, Inf. Sci..

[32]  R. Gray,et al.  Vector quantization , 1984, IEEE ASSP Magazine.

[33]  H. Robbins A Stochastic Approximation Method , 1951 .

[34]  Terrence J. Sejnowski,et al.  Self-Organizing Map Formation: Foundations of Neural Computation , 2001 .

[35]  John W. Sammon,et al.  A Nonlinear Mapping for Data Structure Analysis , 1969, IEEE Transactions on Computers.

[36]  Helge J. Ritter,et al.  Neural computation and self-organizing maps - an introduction , 1992, Computation and neural systems series.

[37]  Risto Miikkulainen,et al.  Subsymbolic natural language processing - an integrated model of scripts, lexicon, and memory , 1993, Neural network modeling and connectionism.

[38]  Samuel Kaski,et al.  Bibliography of Self-Organizing Map (SOM) Papers: 1981-1997 , 1998 .

[39]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[40]  T. Kohonen SELF-ORGANIZING MAPS: OPHMIZATION APPROACHES , 1991 .

[41]  R. Tryon Cluster Analysis , 1939 .

[42]  Samuel Kaski,et al.  Keyword selection method for characterizing text document maps , 1999 .

[43]  Erkki Oja,et al.  Kohonen Maps , 1999, Encyclopedia of Machine Learning.

[44]  Paul L. Zador,et al.  Asymptotic quantization error of continuous signals and the quantization dimension , 1982, IEEE Trans. Inf. Theory.

[45]  Esa Alhoniemi,et al.  Self-Organizing Map for Data Mining in MATLAB: The SOM Toolbox , 1999 .

[46]  T. Kohonen,et al.  Bibliography of Self-Organizing Map SOM) Papers: 1998-2001 Addendum , 2003 .

[47]  T. Kohonen,et al.  Exploratory Data Analysis by the Self-Organizing Map: Structures of Welfare and Poverty in the World , 1996 .

[48]  Teuvo Kohonen,et al.  Self-organized formation of topologically correct feature maps , 2004, Biological Cybernetics.

[49]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[50]  John A. Hartigan,et al.  Clustering Algorithms , 1975 .

[51]  Yiming Yang,et al.  RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..

[52]  Tzay Y. Young Handbook of pattern recognition and image processing (vol. 2): computer vision , 1994 .

[53]  Teuvo Kohonen,et al.  Self-Organizing Maps , 2010 .

[54]  A. Naim,et al.  Galaxy Morphology without Classification: Self-organizing Maps , 1997 .