Vector quantization using information theoretic concepts

The process of representing a large data set with a smaller number of vectors in the best possible way, also known as vector quantization, has been intensively studied in the recent years. Very efficient algorithms like the Kohonen self-organizing map (SOM) and the Linde Buzo Gray (LBG) algorithm have been devised. In this paper a physical approach to the problem is taken, and it is shown that by considering the processing elements as points moving in a potential field an algorithm equally efficient as the before mentioned can be derived. Unlike SOM and LBG this algorithm has a clear physical interpretation and relies on minimization of a well defined cost function. It is also shown how the potential field approach can be linked to information theory by use of the Parzen density estimator. In the light of information theory it becomes clear that minimizing the free energy of the system is in fact equivalent to minimizing a divergence measure between the distribution of the data and the distribution of the processing elements, hence, the algorithm can be seen as a density matching method.

[1]  Andrew Chi-Sing Leung,et al.  Yet another algorithm which can generate topography map , 1997, IEEE Trans. Neural Networks.

[2]  Robert M. Gray,et al.  An Algorithm for Vector Quantizer Design , 1980, IEEE Trans. Commun..

[3]  Richard Durbin,et al.  An analogue approach to the travelling salesman problem using an elastic net method , 1987, Nature.

[4]  Jouko Lampinen,et al.  On the generative probability density model in the self-organizing map , 2002, Neurocomputing.

[5]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[6]  Marc M Van Hulle Kernel-based topographic map formation achieved with an information-theoretic approach. , 2002, Neural networks : the official journal of the International Neural Network Society.

[7]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[8]  J. Mercer Functions of positive and negative type, and their connection with the theory of integral equations , 1909 .

[9]  Klaus Schulten,et al.  Self-organizing maps: ordering, convergence properties and energy functions , 1992, Biological Cybernetics.

[10]  T. Heskes Energy functions for self-organizing maps , 1999 .

[11]  K. Obermayer,et al.  PHASE TRANSITIONS IN STOCHASTIC SELF-ORGANIZING MAPS , 1997 .

[12]  T. Kohonen Self-organized formation of topographically correct feature maps , 1982 .

[13]  John W. Fisher,et al.  Learning from Examples with Information Theoretic Criteria , 2000, J. VLSI Signal Process..

[14]  Deniz Erdogmus,et al.  Generalized information potential criterion for adaptive system training , 2002, IEEE Trans. Neural Networks.

[15]  Hilbert J. Kappen,et al.  Error potentials for self-organization , 1993, IEEE International Conference on Neural Networks.

[16]  Deniz Erdogmus,et al.  Beyond second-order statistics for learning: A pairwise interaction model for entropy estimation , 2002, Natural Computing.

[17]  Huaiyu Zhu On Information and Sufficiency , 1997 .

[18]  Hujun Yin,et al.  Self-organizing mixture networks for probability density estimation , 2001, IEEE Trans. Neural Networks.

[19]  Timo Kostiainen,et al.  Generative probability density model in the self-organizing map , 2001 .

[20]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[21]  Teuvo Kohonen,et al.  Self-organized formation of topologically correct feature maps , 2004, Biological Cybernetics.

[22]  Christopher M. Bishop,et al.  GTM: A Principled Alternative to the Self-Organizing Map , 1996, NIPS.

[23]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .