Quantization via Empirical Divergence Maximization

Empirical divergence maximization (EDM) refers to a recently proposed strategy for estimating f-divergences and likelihood ratio functions. This paper extends the idea to empirical vector quantization where one seeks to empirically derive quantization rules that maximize the Kullback-Leibler divergence between two statistical hypotheses. We analyze the estimator's error convergence rate leveraging Tsybakov's margin condition and show that rates as fast as n-1 are possible, where n equals the number of training samples. We also show that the Flynn and Gray algorithm can be used to efficiently compute EDM estimates and show that they can be efficiently and accurately represented by recursive dyadic partitions. The EDM formulation have several advantages. First, the formulation gives access to the tools and results of empirical process theory that quantify the estimator's error convergence rate. Second, the formulation provides a previously unknown derivation for the Flynn and Gray algorithm. Third, the flexibility it affords allows one to avoid a small-cell assumption common in other approaches. Finally, we illustrate the potential use of the method through an example.

[1]  Venugopal V. Veeravalli,et al.  Decentralized detection in sensor networks , 2003, IEEE Trans. Signal Process..

[2]  Rui M. Castro,et al.  Active Learning and Adaptive Sampling for Non-Parametric Inference , 2007 .

[3]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[4]  R. Tyrrell Rockafellar,et al.  Convex Analysis , 1970, Princeton Landmarks in Mathematics and Physics.

[5]  S. Geer,et al.  Square root penalty: Adaptation to the margin in classification and in edge estimation , 2005, math/0507422.

[6]  A. Tsybakov,et al.  Optimal aggregation of classifiers in statistical learning , 2003 .

[7]  R. Gray Entropy and Information Theory , 1990, Springer New York.

[8]  Sanjeev R. Kulkarni,et al.  Universal Divergence Estimation for Finite-Alphabet Sources , 2006, IEEE Transactions on Information Theory.

[9]  S. A. van de Geer,et al.  Lectures on Empirical Processes: Theory and Statistical Applications , 2007 .

[10]  Robert M. Gray,et al.  Encoding of correlated observations , 1987, IEEE Trans. Inf. Theory.

[11]  Martin J. Wainwright,et al.  Estimating Divergence Functionals and the Likelihood Ratio by Convex Risk Minimization , 2008, IEEE Transactions on Information Theory.

[12]  H. V. Poor,et al.  Applications of Ali-Silvey Distance Measures in the Design of Generalized Quantizers for Binary Decision Systems , 1977, IEEE Trans. Commun..

[13]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[14]  W. Fulks Advanced Calculus: An Introduction to Analysis , 1969 .

[15]  Michael A. Lexa Empirical divergence maximization for quantizer design: An analysis of approximation error , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[16]  Svetlana Lazebnik,et al.  Supervised Learning of Quantizer Codebooks by Information Loss Minimization , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Robert D. Nowak,et al.  Minimax-optimal classification with dyadic decision trees , 2006, IEEE Transactions on Information Theory.

[18]  John N. Tsitsiklis,et al.  Extremal properties of likelihood-ratio quantizers , 1993, IEEE Trans. Commun..

[19]  Michael A. Lexa,et al.  Empirical quantization for sparse sampling systems , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[20]  Don H. Johnson,et al.  Information-theoretic analysis of neural coding , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[21]  Don H. Johnson,et al.  Information-Theoretic Analysis of Neural Coding , 2004, Journal of Computational Neuroscience.

[22]  Shun-ichi Amari,et al.  Methods of information geometry , 2000 .

[23]  Raphail E. Krichevsky,et al.  The performance of universal encoding , 1981, IEEE Trans. Inf. Theory.

[24]  S. R. Jammalamadaka,et al.  Empirical Processes in M-Estimation , 2001 .

[25]  Evgueni A. Haroutunian,et al.  Information Theory and Statistics , 2011, International Encyclopedia of Statistical Science.

[26]  Maurizio Longo,et al.  Quantization for decentralized hypothesis testing under communication constraints , 1990, IEEE Trans. Inf. Theory.

[27]  Alfred O. Hero,et al.  Information-Geometric Dimensionality Reduction , 2011, IEEE Signal Processing Magazine.

[28]  E. Mammen,et al.  Smooth Discrimination Analysis , 1999 .

[29]  Minh N. Do,et al.  Wavelet-based texture retrieval using generalized Gaussian density and Kullback-Leibler distance , 2002, IEEE Trans. Image Process..

[30]  Alfred O. Hero,et al.  High-rate vector quantization for detection , 2003, IEEE Trans. Inf. Theory.

[31]  H. Poor,et al.  Fine quantization in signal detection and estimation , 1988, IEEE Trans. Inf. Theory.

[32]  John W. Fisher,et al.  Nonparametric hypothesis tests for statistical dependency , 2004, IEEE Transactions on Signal Processing.

[33]  Alfred O. Hero,et al.  FINE: Fisher Information Nonparametric Embedding , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.