Implementation of Self-organizing Maps with Python

As a member of Artificial Neural Networks, Self-Organizing Maps (SOMs) have been well researched since 1980s, and have been implemented in C, Fortran, R [1] and Python [2]. Python is an efficient high-level language widely used in the machine learning field for years, but most of the SOM-related packages which are written in Python only perform model construction and visualization. However, the POPSOM package, written in R, is capable of performing functionality beyond model construction and visualization, such as evaluating the model’s quality with statistical methods and plotting marginal probability distributions of the neurons. In order to give the Python user the POPSOM package’s advantages, it is important to migrate the POPSOM package to be Python-based. This study shows the details of this implementation. There are three major tasks for the implementation: 1) Migrate the POPSOM package from R to Python; 2) Refactor the source code from procedural programming paradigm to object-oriented programming paradigm; 3) Improve the package by adding normalization options to the model construction function. In addition to constructing the model in Python, Fortran is also embedded to accelerate the speed of model construction significantly in this project. The final program has been completed, and it is necessary to guarantee the correctness of the program. The best way to achieve this goal is to compare the output of the Python-based program to the output generated by the R-based program. For the model construction function, the SOM algorithm initializes the weight vector of the neurons randomly at the very beginning, and then selects the input vectors randomly during the training. Due to these two random factors, one cannot expect the same input (data set) will result in exactly the same output (neurons). Instead, to give evidence that the Python program is working properly, there are two solutions that have been proposed and applied in this project: 1) measuring the average difference of vectors between two neurons which have been generated by the R and Python functions respectively; 2) measuring the ratio of the variances and the difference of features’ mean for the two neurons. Besides the model construction, model visualization and other functions which take neurons as their input should return the same results by feeding the same input (neurons). The detail of above verification will be represented in the following chapters.

[1]  Guido van Rossum,et al.  Python Programming Language , 2007, USENIX Annual Technical Conference.

[2]  Gregory T. Breard,et al.  Evaluating Self-Organizing Map Quality Measures as Convergence Criteria , 2017 .

[3]  Hujun Yin,et al.  On the Distribution and Convergence of Feature Space in Self-Organizing Maps , 1995, Neural Computation.

[4]  Eugene Loh The ideal HPC programming language , 2010, Commun. ACM.

[5]  Karl Pearson F.R.S. LIII. On lines and planes of closest fit to systems of points in space , 1901 .

[6]  M. Pagano,et al.  Student's t test. , 1993, Nutrition.

[7]  E. Slud Statistical Computing with R , 2009 .

[8]  Eric Jones,et al.  SciPy: Open Source Scientific Tools for Python , 2001 .

[9]  David H. Brown Cartogram data projection for self-organizing maps , 2012 .

[10]  Martín Abadi,et al.  A Theory of Objects , 1996, Monographs in Computer Science.

[11]  Marina Fruehauf,et al.  Encyclopedia Of Research Design , 2016 .

[12]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[13]  Lutz Hamel,et al.  A Population Based Convergence Criterion for Self-Organizing Maps , 2012 .

[14]  Paulo Cortez,et al.  Modeling wine preferences by data mining from physicochemical properties , 2009, Decis. Support Syst..

[15]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[16]  Benjamin H. Ott A convergence criterion for self-organizing maps , 2012 .

[17]  Peter Wittek,et al.  Somoclu: An Ecient Parallel Library for , 2015 .

[18]  Peter Goldsborough,et al.  A Tour of TensorFlow , 2016, ArXiv.