A source coding approach to classification by vector quantization and the principle of minimum description length

An algorithm for supervised classification using vector quantization and entropy coding is presented. The classification rule is formed from a set of training data {(X/sub i/, Y/sub i/)}/sub i=1//sup n/, which are independent samples from a joint distribution P/sub XY/. Based on the principle of minimum description length (MDL), a statistical model that approximates the distribution P/sub XY/ ought to enable efficient coding of X and Y. On the other hand, we expect a system that encodes (X, Y) efficiently to provide ample information on the distribution P/sub XY/. This information can then be used to classify X, i.e., to predict the corresponding Y based on X. To encode both X and Y, a two-stage vector quantizer is applied to X and a Huffman code is formed for Y conditioned on each quantized value of X. The optimization of the encoder is equivalent to the design of a vector quantizer with an objective function reflecting the joint penalty of quantization error and misclassification rate. This vector quantizer provides an estimation of the conditional distribution of Y given X, which in turn yields an approximation to the Bayes classification rule. This algorithm, namely discriminant vector quantization (DVQ), is compared with learning vector quantization (LVQ) and CART/sup R/ on a number of data sets. DVQ outperforms the other two on several data sets. The relation between DVQ, density estimation, and regression is also discussed.

[1]  R K Blashfield,et al.  The Literature On Cluster Analysis. , 1978, Multivariate behavioral research.

[2]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[3]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[4]  Robert M. Gray,et al.  Minimum discrimination information clustering: modeling and quantization with Gauss mixtures , 2001, Proceedings 2001 International Conference on Image Processing (Cat. No.01CH37205).

[5]  J. Rissanen Stochastic Complexity and Modeling , 1986 .

[6]  Jorma Rissanen,et al.  Density estimation by stochastic complexity , 1992, IEEE Trans. Inf. Theory.

[7]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[8]  T. Kohonen,et al.  Statistical pattern recognition with neural networks: benchmarking studies , 1988, IEEE 1988 International Conference on Neural Networks.

[9]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[10]  R. Gray,et al.  Combining Image Compression and Classification Using Vector Quantization , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  Robert M. Gray,et al.  Joint image compression and classification with vector quantization and a two dimensional hidden Markov model , 1999, Proceedings DCC'99 Data Compression Conference (Cat. No. PR00096).

[12]  A. Lapidoth On the role of mismatch in rate distortion theory , 1995, Proceedings of 1995 IEEE International Symposium on Information Theory.

[13]  J. L. Hodges,et al.  Discriminatory Analysis - Nonparametric Discrimination: Consistency Properties , 1989 .

[14]  Maurice K. Wong,et al.  Algorithm AS136: A k-means clustering algorithm. , 1979 .

[15]  J. Rissanen,et al.  Modeling By Shortest Data Description* , 1978, Autom..

[16]  Robert M. Gray,et al.  Bayes risk weighted vector quantization with posterior estimation for image compression and classification , 1996, IEEE Trans. Image Process..

[17]  Teuvo Kohonen,et al.  Self-Organization and Associative Memory , 1988 .

[18]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[19]  Teuvo Kohonen,et al.  Self-organization and associative memory: 3rd edition , 1989 .

[20]  P. J. Green,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[21]  Belur V. Dasarathy,et al.  Nearest neighbor (NN) norms: NN pattern classification techniques , 1991 .