Joint Entropy Maximization in Kernel-Based Topographic Maps

A new learning algorithm for kernel-based topographic map formation is introduced. The kernel parameters are adjusted individually so as to maximize the joint entropy of the kernel outputs. This is done by maximizing the differential entropies of the individual kernel outputs, given that the map's output redundancy, due to the kernel overlap, needs to be minimized. The latter is achieved by minimizing the mutual information between the kernel outputs. As a kernel, the (radial) incomplete gamma distribution is taken since, for a gaussian input density, the differential entropy of the kernel output will be maximal. Since the theoretically optimal joint entropy performance can be derived for the case of nonoverlapping gaussian mixture densities, a new clustering algorithm is suggested that uses this optimum as its null distribution. Finally, it is shown that the learning algorithm is similar to one that performs stochastic gradient descent on the Kullback-Leibler divergence for a heteroskedastic gaussian mixture density model.

[1]  Marc M. Van Hulle,et al.  Faithful Representations and Topographic Maps: From Distortion- to Information-Based Self-Organization , 2000 .

[2]  L. Breiman,et al.  Variable Kernel Estimates of Multivariate Densities , 1977 .

[3]  B. Silverman,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[4]  Péter András Kernel-Kohonen Networks , 2002, Int. J. Neural Syst..

[5]  Akio Utsugi Hyperparameter Selection for Self-Organizing Maps , 1997, Neural Computation.

[6]  Klaus Obermayer,et al.  Self-organizing maps: Generalizations and new optimization techniques , 1998, Neurocomputing.

[7]  Van Hulle MM Kernel-Based Equiprobabilistic Topographic Map Formation. , 1998, Neural computation.

[8]  K. Obermayer,et al.  PHASE TRANSITIONS IN STOCHASTIC SELF-ORGANIZING MAPS , 1997 .

[9]  A. Cuevas,et al.  Estimating the number of clusters , 2000 .

[10]  Teuvo Kohonen,et al.  Self-organized formation of topologically correct feature maps , 2004, Biological Cybernetics.

[11]  Hujun Yin,et al.  Self-organizing mixture networks for probability density estimation , 2001, IEEE Trans. Neural Networks.

[12]  Timo Kostiainen,et al.  Generative probability density model in the self-organizing map , 2001 .

[13]  Teuvo Kohonen,et al.  Self-Organizing Maps , 2010 .

[14]  J. William Ahwood,et al.  CLASSIFICATION , 1931, Foundations of Familiar Language.

[15]  Andrew Chi-Sing Leung,et al.  Yet another algorithm which can generate topography map , 1997, IEEE Trans. Neural Networks.

[16]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[17]  Richard Durbin,et al.  An analogue approach to the travelling salesman problem using an elastic net method , 1987, Nature.

[18]  Jouko Lampinen,et al.  On the generative probability density model in the self-organizing map , 2002, Neurocomputing.

[19]  Robert Tibshirani,et al.  Estimating the number of clusters in a data set via the gap statistic , 2000 .

[20]  Ralph Linsker,et al.  How to Generate Ordered Maps by Maximizing the Mutual Information between Input and Output Signals , 1989, Neural Computation.

[21]  Christopher M. Bishop,et al.  GTM: The Generative Topographic Mapping , 1998, Neural Computation.

[22]  Terrence J. Sejnowski,et al.  An Information-Maximization Approach to Blind Separation and Blind Deconvolution , 1995, Neural Computation.

[23]  Ronald L. Graham,et al.  Concrete Mathematics, a Foundation for Computer Science , 1991, The Mathematical Gazette.

[24]  Eric W. Weisstein,et al.  The CRC concise encyclopedia of mathematics , 1999 .