A Reproducing Kernel Hilbert Space Framework for Information-Theoretic Learning

This paper provides a functional analysis perspective of information-theoretic learning (ITL) by defining bottom-up a reproducing kernel Hilbert space (RKHS) uniquely determined by the symmetric nonnegative definite kernel function known as the cross-information potential (CIP). The CIP as an integral of the product of two probability density functions characterizes similarity between two stochastic functions. We prove the existence of a one-to-one congruence mapping between the ITL RKHS and the Hilbert space spanned by square integrable probability density functions. Therefore, all the statistical descriptors in the original information-theoretic learning formulation can be rewritten as algebraic computations on deterministic functional vectors in the ITL RKHS, instead of limiting the functional view to the estimators as is commonly done in kernel methods. A connection between the ITL RKHS and kernel approaches interested in quantifying the statistics of the projected data is also established.

[1]  R. Kondor,et al.  Bhattacharyya and Expected Likelihood Kernels , 2003 .

[2]  C. R. Rao,et al.  Information and the Accuracy Attainable in the Estimation of Statistical Parameters , 1992 .

[3]  Michael I. Jordan,et al.  Kernel independent component analysis , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[4]  Thomas Kailath,et al.  An RKHS approach to detection and estimation problems- III: Generalized innovations representations and a likelihood-ratio formula , 1972, IEEE Trans. Inf. Theory.

[5]  José Carlos Príncipe,et al.  Information Theoretic Clustering , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  J.C. Principe,et al.  From linear adaptive filtering to nonlinear information processing - The design and analysis of information processing systems , 2006, IEEE Signal Processing Magazine.

[7]  William McCrea,et al.  The Theory of Space, Time and Gravitation , 1961 .

[8]  Deniz Erdogmus,et al.  Feature extraction using information-theoretic learning , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Thomas Kailath,et al.  RKHS approach to detection and estimation problems-I: Deterministic signals in Gaussian noise , 1971, IEEE Trans. Inf. Theory.

[10]  E. Kreyszig Introductory Functional Analysis With Applications , 1978 .

[11]  J. Mercer Functions of Positive and Negative Type, and their Connection with the Theory of Integral Equations , 1909 .

[12]  Shun-ichi Amari,et al.  Methods of information geometry , 2000 .

[13]  Deniz Erdogmus,et al.  An error-entropy minimization algorithm for supervised training of nonlinear adaptive systems , 2002, IEEE Trans. Signal Process..

[14]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[15]  Thomas Kailath,et al.  RKHS approach to detection and estimation problems-IV: Non-Gaussian detection , 1973, IEEE Trans. Inf. Theory.

[16]  E. Parzen An Approach to Time Series Analysis , 1961 .

[17]  Deniz Erdogmus,et al.  Information Theoretic Learning , 2005, Encyclopedia of Artificial Intelligence.

[18]  Tony Jebara,et al.  Probability Product Kernels , 2004, J. Mach. Learn. Res..

[19]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[20]  Robert Jenssen,et al.  The Laplacian PDF Distance: A Cost Function for Clustering in a Kernel Feature Space , 2004, NIPS.

[21]  Marc G. Genton,et al.  Classes of Kernels for Machine Learning: A Statistics Perspective , 2002, J. Mach. Learn. Res..

[22]  Deniz Erdogmus,et al.  Stochastic blind equalization based on PDF fitting using Parzen estimator , 2005, IEEE Transactions on Signal Processing.

[23]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .

[24]  Robert Jenssen,et al.  The Cauchy-Schwarz divergence and Parzen windowing: Connections to graph theory and Mercer kernels , 2006, J. Frankl. Inst..

[25]  Thomas Kailath,et al.  An RKHS approach to detection and estimation problems-II: Gaussian signal detection , 1975, IEEE Trans. Inf. Theory.

[26]  Matthias Hein,et al.  Hilbertian Metrics and Positive Definite Kernels on Probability Measures , 2005, AISTATS.

[27]  G. Wahba Spline models for observational data , 1990 .

[28]  Par N. Aronszajn La théorie des noyaux reproduisants et ses applications Première Partie , 1943, Mathematical Proceedings of the Cambridge Philosophical Society.

[29]  Robert Jenssen,et al.  Some Equivalences between Kernel Methods and Information Theoretic Methods , 2006, J. VLSI Signal Process..

[30]  E. Parzen Extraction and Detection Problems and Reproducing Kernel Hilbert Spaces , 1962 .

[31]  R. D. Figueiredo A generalized Fock space framework for nonlinear system and signal analysis , 1983 .

[32]  Bernhard Schölkopf,et al.  A Kernel Method for the Two-Sample-Problem , 2006, NIPS.

[33]  Giovanni Pistone,et al.  An Infinite-Dimensional Geometric Structure on the Space of all the Probability Measures Equivalent to a Given One , 1995 .

[34]  John W. Fisher,et al.  A novel measure for independent component analysis (ICA) , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[35]  Thomas Kailath,et al.  RKHS approach to detection and estimation problems-V: Parameter estimation , 1973, IEEE Trans. Inf. Theory.

[36]  José Carlos Príncipe,et al.  Generalized correlation function: definition, properties, and application to blind equalization , 2006, IEEE Transactions on Signal Processing.

[37]  J. Príncipe The design and analysis of information processing systems ] From Linear Adaptive Filtering to Nonlinear Information Processing , 2009 .

[38]  N. Kemmer,et al.  The Theory of Space, Time and Gravitation , 1964 .

[39]  A. Papoulis,et al.  RKHS Approach to Detection and Estimation Problems-Part IV: Non-Gaussian Detection , 1973 .

[40]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[41]  Huaiyu Zhu On Information and Sufficiency , 1997 .