Somoclu: An Ecient Parallel Library for

Somoclu is a massively parallel tool for training self-organizing maps on large data sets written in C++. It builds on OpenMP for multicore execution, and on MPI for distributing the workload across the nodes in a cluster. It is also able to boost training by using CUDA if graphics processing units are available. A sparse kernel is included, which is useful for high-dimensional but sparse data, such as the vector spaces common in text mining workows.

[1]  David D. Lewis,et al.  Reuters-21578 Text Categorization Test Collection, Distribution 1.0 , 1997 .

[2]  Stephen Weston,et al.  Scalable Strategies for Computing with Massive Data , 2013 .

[3]  Andrey Tovchigrechko,et al.  Parallelizing BLAST and SOM Algorithms with MapReduce-MPI Library , 2011, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum.

[4]  Guido van Rossum,et al.  Python Programming Language , 2007, USENIX Annual Technical Conference.

[5]  Qi Li,et al.  A Chunking Method for Euclidean Distance Matrix Calculation on Large Dataset Using Multi-GPU , 2010, 2010 Ninth International Conference on Machine Learning and Applications.

[6]  Teuvo Kohonen,et al.  Self-Organizing Maps , 2010 .

[7]  Dirk Eddelbuettel,et al.  Rcpp: Seamless R and C++ Integration , 2011 .

[8]  Stefan Pollmann,et al.  PyMVPA: a Python Toolbox for Multivariate Pattern Analysis of fMRI Data , 2009, Neuroinformatics.

[9]  Markus Hofmann,et al.  RapidMiner: Data Mining Use Cases and Business Analytics Applications , 2013 .

[10]  Esa Alhoniemi,et al.  Publication 6 SelfOrganizing Map in Matlab: the SOM Toolbox , 1999 .

[11]  Kurt Hornik,et al.  A tm Plug-In for Distributed Text Mining in R , 2012 .

[12]  Kilian Q. Weinberger,et al.  Learning a kernel matrix for nonlinear dimensionality reduction , 2004, ICML.

[13]  L. Dagum,et al.  OpenMP: an industry standard API for shared-memory programming , 1998 .

[14]  Zhongwen Luo,et al.  Self-Organizing Maps computing on Graphic Process Unit , 2005, ESANN.

[15]  Lutgarde M. C. Buydens,et al.  Self- and Super-organizing Maps in R: The kohonen Package , 2007 .

[16]  Daniel Müllner,et al.  fastcluster: Fast Hierarchical, Agglomerative Clustering Routines for R and Python , 2013 .

[17]  Nathan Bell,et al.  Thrust: A Productivity-Oriented Library for CUDA , 2012 .

[18]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.