Self-organizing maps are a type of artificial neural network extensively used as a data mining and analysis tool in a broad variety of fields including bioinformatics, financial analysis, signal processing, and experimental physics. They are attractive because they provide a simple yet effective algorithm for data clustering and visualization via unsupervised learning. A fundamental question regarding self-organizing maps is the question of convergence or how well the map models the data after training. Here we introduce a population based convergence criterion: the neurons of the map represent one population and the training data represents another population. The map is said to be converged if the neuron and the training data populations appear to be drawn from the same probability distribution. This can easily be tested with standard two-sample tests. This paper develops the underpinnings of this approach and then applies this new convergence criterion to real-world data sets. We demonstrate that our convergence criterion can be considered an appropriate model selection criterion.
[1]
Lutz Hamel,et al.
Improved Interpretability of the Unified Distance Matrix with Connected Components To Appear Proceedings of DMIN ’ 11
,
2011
.
[2]
Christopher M. Bishop,et al.
GTM: A Principled Alternative to the Self-Organizing Map
,
1996,
NIPS.
[3]
T. Heskes.
Energy functions for self-organizing maps
,
1999
.
[4]
Klaus Schulten,et al.
Self-organizing maps: ordering, convergence properties and energy functions
,
1992,
Biological Cybernetics.
[5]
E. de Bodt,et al.
A Statistical Tool to Assess the Reliability of Self-Organizing Maps
,
2001,
WSOM.
[6]
M. Kenward,et al.
An Introduction to the Bootstrap
,
2007
.
[7]
Hujun Yin,et al.
On the Distribution and Convergence of Feature Space in Self-Organizing Maps
,
1995,
Neural Computation.