A Scalable Parallel Algorithm for Self-Organizing Maps with Applications to Sparse Data Mining Problems

We describe a scalable parallel implementation of the self organizing map (SOM) suitable for data-mining applications involving clustering or segmentation against large data sets such as those encountered in the analysis of customer spending patterns. The parallel algorithm is based on the batch SOM formulation in which the neural weights are updated at the end of each pass over the training data. The underlying serial algorithm is enhanced to take advantage of the sparseness often encountered in these data sets. Analysis of a realistic test problem shows that the batch SOM algorithm captures key features observed using the conventional on-line algorithm, with comparable convergence rates.Performance measurements on an SP2 parallel computer are given for two retail data sets and a publicly available set of census data.These results demonstrate essentially linear speedup for the parallel batch SOM algorithm, using both a memory-contained sparse formulation as well as a separate implementation in which the mining data is accessed directly from a parallel file system. We also present visualizations of the census data to illustrate the value of the clustering information obtained via the parallel SOM method.

[1]  Wee Keong Ng,et al.  Vector quantization for lossless textual data compression , 1995, Proceedings DCC '95 Data Compression Conference.

[2]  Timo Honkela,et al.  Self-Organizing Maps of Document Collections: A New Approach to Interactive Exploration , 1996, KDD.

[3]  Dimitrios Gunopulos,et al.  Automatic subspace clustering of high dimensional data for data mining applications , 1998, SIGMOD '98.

[4]  William Gropp,et al.  Skjellum using mpi: portable parallel programming with the message-passing interface , 1994 .

[5]  G. Myklebust,et al.  Parallel self-organizing maps for actual applications , 1995, Proceedings of ICNN'95 - International Conference on Neural Networks.

[6]  Chia-Jiu Wang,et al.  Parallelizing the self-organizing feature map on multiprocessor systems , 1991, Parallel Comput..

[7]  Vladimir Cherkassky,et al.  Self-Organization as an Iterative Kernel Smoothing Process , 1995, Neural Computation.

[8]  Pasi Koikkalainen,et al.  Progress with the Tree-Structured Self-Organizing Map , 1994, ECAI.

[9]  Teuvo Kohonen,et al.  The self-organizing map , 1990, Neurocomputing.

[10]  Evangelos Simoudis,et al.  Reality Check for Data Mining , 1996, IEEE Expert.

[11]  Holly E. Rushmeier,et al.  Visualizing customer segmentations produce by self organizing maps (case study) , 1997, IEEE Visualization.

[12]  Paolo Ienne,et al.  Modified self-organizing feature map algorithms for efficient digital hardware implementation , 1997, IEEE Trans. Neural Networks.

[13]  Helge J. Ritter,et al.  Large-scale simulations of self-organizing neural networks on parallel computers: application to biological modelling , 1990, Parallel Comput..

[14]  Stephen P. Luttrell,et al.  Derivation of a class of training algorithms , 1990, IEEE Trans. Neural Networks.

[15]  Teuvo Kohonen,et al.  The 'neural' phonetic typewriter , 1988, Computer.

[16]  Samuel A. Fineberg,et al.  Using MPI-Portable Parallel Programming with the Message-Passing Interface, by William Gropp , 1996 .

[17]  Chinya V. Ravishankar,et al.  Relational database compression using augmented vector quantization , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[18]  Timo Honkela,et al.  Very Large Two-Level SOM for the Browsing of Newsgroups , 1996, ICANN.

[19]  S. Lu Pattern classification using self-organizing feature maps , 1990, 1990 IJCNN International Joint Conference on Neural Networks.

[20]  Alfredo Petrosino,et al.  Competitive neural networks on message-passing parallel computers , 1993, Concurr. Pract. Exp..

[21]  Teuvo Kohonen,et al.  Things you haven't heard about the self-organizing map , 1993, IEEE International Conference on Neural Networks.

[22]  Vladimir Cherkassky,et al.  Learning rate schedules for self-organizing maps , 1994, Proceedings of the 12th IAPR International Conference on Pattern Recognition, Vol. 3 - Conference C: Signal Processing (Cat. No.94CH3440-5).

[23]  D. J. Percival Compressed representation of a backscatter ionogram database using Karhunen-Loeve techniques , 1995 .

[24]  H. Rushmeier,et al.  Case study: visualizing customer segmentations produced by self organizing maps , 1997, Proceedings. Visualization '97 (Cat. No. 97CB36155).

[25]  Timo Honkela,et al.  Self-Organizing Maps of Document Collections , 1996 .

[26]  Jorma Laaksonen,et al.  SOM_PAK: The Self-Organizing Map Program Package , 1996 .