Self-organizing maps with information theoretic learning

Abstract The self-organizing map (SOM) is one of the popular clustering and data visualization algorithms and has evolved as a useful tool in pattern recognition, data mining since it was first introduced by Kohonen. However, it is observed that the magnification factor for such mappings deviates from the information-theoretically optimal value of 1 (for the SOM it is 2/3). This can be attributed to the use of the mean square error to adapt the system, which distorts the mapping by oversampling the low probability regions. In this work, we first discuss the kernel SOM in terms of a similarity measure called correntropy induced metric (CIM) and empirically show that this can enhance the magnification of the mapping without much increase in the computational complexity of the algorithm. We also show that adapting the SOM in the CIM sense is equivalent to reducing the localized cross information potential, an information-theoretic function that quantifies the similarity between two probability distributions. Using this property we propose a kernel bandwidth adaptation algorithm for Gaussian kernels, with both homoscedastic and heteroscedastic components. We show that the proposed model can achieve a mapping with optimal magnification and can automatically adapt the parameters of the kernel function.

[1]  Jens Christian Claussen Winner-Relaxing Self-Organizing Maps , 2005, Neural Computation.

[2]  K. Rose Deterministic annealing for clustering, compression, classification, regression, and related optimization problems , 1998, Proc. IEEE.

[3]  Hujun Yin,et al.  Self-organizing mixture networks for probability density estimation , 2001, IEEE Trans. Neural Networks.

[4]  Péter András Kernel-Kohonen Networks , 2002, Int. J. Neural Syst..

[5]  T. Kohonen Self-organized formation of topographically correct feature maps , 1982 .

[6]  Marc M. Van Hulle,et al.  Kernel-Based Topographic Maps: Theory and Applications , 2009, Wiley Encyclopedia of Computer Science and Engineering.

[7]  Robert Jenssen,et al.  Gaussianization: An Efficient Multivariate Density Estimation Technique for Statistical Signal Processing , 2006, J. VLSI Signal Process..

[8]  Weifeng Liu,et al.  Correntropy: Properties and Applications in Non-Gaussian Signal Processing , 2007, IEEE Transactions on Signal Processing.

[9]  Marc M. Van Hulle Kernel-Based Topographic Map Formation by Local Density Modeling , 2002, Neural Computation.

[10]  José Carlos Príncipe,et al.  Nonlinear Component Analysis Based on Correntropy , 2006, The 2006 IEEE International Joint Conference on Neural Network Proceedings.

[11]  Ramesh A. Gopinath,et al.  Gaussianization , 2000, NIPS.

[12]  Teuvo Kohonen,et al.  Self-organized formation of topologically correct feature maps , 2004, Biological Cybernetics.

[13]  Jose C. Principe,et al.  Information Theoretic Learning - Renyi's Entropy and Kernel Perspectives , 2010, Information Theoretic Learning.

[14]  M. V. Velzen,et al.  Self-organizing maps , 2007 .

[15]  Terrence J. Sejnowski,et al.  An Information-Maximization Approach to Blind Separation and Blind Deconvolution , 1995, Neural Computation.

[16]  José Carlos Príncipe,et al.  Generalized correlation function: definition, properties, and application to blind equalization , 2006, IEEE Transactions on Signal Processing.

[17]  Nicola Torelli,et al.  Clustering via nonparametric density estimation , 2007, Stat. Comput..

[18]  Deniz Erdogmus,et al.  Vector quantization using information theoretic concepts , 2005, Natural Computing.

[19]  Paul L. Zador,et al.  Asymptotic quantization error of continuous signals and the quantization dimension , 1982, IEEE Trans. Inf. Theory.

[20]  Hujun Yin,et al.  On the equivalence between kernel self-organising maps and self-organising mixture density networks , 2006, Neural Networks.

[21]  Colin Fyfe,et al.  The kernel self-organising map , 2000, KES'2000. Fourth International Conference on Knowledge-Based Intelligent Engineering Systems and Allied Technologies. Proceedings (Cat. No.00TH8516).

[22]  Octavia I. Camps,et al.  Weighted Parzen Windows for Pattern Classification , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[23]  T. Heskes Energy functions for self-organizing maps , 1999 .

[24]  Hujun Yin,et al.  Kernel self-organising maps for classification , 2006, Neurocomputing.

[25]  Thomas Villmann,et al.  Magnification Control in Self-Organizing Maps and Neural Gas , 2006, Neural Computation.

[26]  John W. Fisher,et al.  Learning from Examples with Information Theoretic Criteria , 2000, J. VLSI Signal Process..

[27]  Ralf Der,et al.  Controlling the Magnification Factor of Self-Organizing Feature Maps , 1996, Neural Computation.

[28]  Helge J. Ritter,et al.  Neural computation and self-organizing maps - an introduction , 1992, Computation and neural systems series.

[29]  Marc M. Van Hulle Joint Entropy Maximization in Kernel-Based Topographic Maps , 2002, Neural Computation.

[30]  Klaus Obermayer,et al.  Self-organizing maps: Generalizations and new optimization techniques , 1998, Neurocomputing.

[31]  Klaus Schulten,et al.  Self-organizing maps: ordering, convergence properties and energy functions , 1992, Biological Cybernetics.

[32]  Marc M. Van Hulle,et al.  Faithful Representations and Topographic Maps: From Distortion- to Information-Based Self-Organization , 2000 .

[33]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[34]  P. J. Green,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[35]  Deniz Erdoğmuş,et al.  Vector-quantization by density matching in the minimum Kullback-Leibler divergence sense , 2004, 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541).

[36]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[37]  T. Kohonen Self-Organized Formation of Correct Feature Maps , 1982 .

[38]  Ralph Linsker,et al.  How to Generate Ordered Maps by Maximizing the Mutual Information between Input and Output Signals , 1989, Neural Computation.

[39]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .

[40]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation (3rd Edition) , 2007 .