MIGSOM: A SOM Algorithm for Large Scale Hyperlinked Documents Inspired by Neuronal Migration

The SOM (Self Organizing Map), one of the most popular unsupervised machine learning algorithms, maps high-dimensional vectors into low-dimensional data (usually a 2-dimensional map). The SOM is widely known as a “scalable” algorithm because of its capability to handle large numbers of records. However, it is effective only when the vectors are small and dense. Although a number of studies on making the SOM scalable have been conducted, technical issues on scalability and performance for sparse high-dimensional data such as hyperlinked documents still remain. In this paper, we introduce MIGSOM, an SOM algorithm inspired by new discovery on neuronal migration. The two major advantages of MIGSOM are its scalability for sparse high-dimensional data and its clustering visualization functionality. In this paper, we describe the algorithm and implementation in detail, and show the practicality of the algorithm in several experiments. We applied MIGSOM to not only experimental data sets but also a large scale real data set: Wikipedia’s hyperlink data.

[1]  Jacob Beal,et al.  Self-Managing Associative Memory for Dynamic Acquisition of Expertise in High-Level Domains , 2009, IJCAI.

[2]  Janine M. Benyus,et al.  Biomimicry: Innovation Inspired by Nature , 1997 .

[3]  Timo Honkela,et al.  WEBSOM - Self-organizing maps of document collections , 1998, Neurocomputing.

[4]  Esa Alhoniemi,et al.  Clustering of the self-organizing map , 2000, IEEE Trans. Neural Networks Learn. Syst..

[5]  Grégoire Lefebvre,et al.  Supervised Image Classification by SOM Activity Map Comparison , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[6]  Chris H. Q. Ding,et al.  Adaptive dimension reduction for clustering high dimensional data , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[7]  Grace Tiao,et al.  Insights into the gyrification of developing ferret brain by magnetic resonance imaging , 2007, Journal of anatomy.

[8]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[9]  Jaime Grutzendler,et al.  Two modes of radial migration in early development of the cerebral cortex , 2001, Nature Neuroscience.

[10]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[11]  Holly E. Rushmeier,et al.  A Scalable Parallel Algorithm for Self-Organizing Maps with Applications to Sparse Data Mining Problems , 1999, Data Mining and Knowledge Discovery.

[12]  Samuel Kaski,et al.  Mining massive document collections by the WEBSOM method , 2004, Inf. Sci..

[13]  Bernhard Hellwig,et al.  A quantitative analysis of the local connectivity between pyramidal neurons in layers 2/3 of the rat visual cortex , 2000, Biological Cybernetics.

[14]  Hongyi Wu,et al.  Scalable and fully distributed localization with mere connectivity , 2011, 2011 Proceedings IEEE INFOCOM.

[15]  G. Giorgetti,et al.  Wireless Localization Using Self-Organizing Maps , 2007, 2007 6th International Symposium on Information Processing in Sensor Networks.

[16]  Teuvo Kohonen,et al.  The self-organizing map , 1990 .

[17]  Samuel Kaski,et al.  Self organization of a massive document collection , 2000, IEEE Trans. Neural Networks Learn. Syst..

[18]  K. Mikoshiba,et al.  Cdk5 is required for multipolar-to-bipolar transition during radial neuronal migration and proper dendrite development of pyramidal neurons in the cerebral cortex , 2007, Development.