Approximating Mutual Information by Maximum Likelihood Density Ratio Estimation

Mutual information is useful in various data processing tasks such as feature selection or independent component analysis. In this paper, we propose a new method of approximating mutual information based on maximum likelihood estimation of a density ratio function. Our method, called Maximum Likelihood Mutual Information (MLMI), has several attractive properties, e.g., density estimation is not involved, it is a single-shot procedure, the global optimal solution can be efficiently computed, and cross-validation is available for model selection. Numerical experiments show that MLMI compares favorably with existing methods.

[1]  Marc M. Van Hulle,et al.  Edgeworth Approximation of Multivariate Differential Entropy , 2005, Neural Computation.

[2]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[3]  Robert A. Lordo,et al.  Nonparametric and Semiparametric Models , 2005, Technometrics.

[4]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[5]  Fraser,et al.  Independent coordinates for strange attractors from mutual information. , 1986, Physical review. A, General physics.

[6]  Pierre Comon,et al.  Independent component analysis, A new concept? , 1994, Signal Process..

[7]  S. Geer Empirical Processes in M-Estimation , 2000 .

[8]  Kari Torkkola,et al.  Feature Extraction by Non-Parametric Mutual Information Maximization , 2003, J. Mach. Learn. Res..

[9]  S. Saigal,et al.  Relative performance of mutual information estimation methods for quantifying the dependence among short and noisy data. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[10]  A. Kraskov,et al.  Estimating mutual information. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[11]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[12]  E. Giné,et al.  Limit Theorems for $U$-Processes , 1993 .

[13]  S. R. Jammalamadaka,et al.  Empirical Processes in M-Estimation , 2001 .

[14]  Ing Rj Ser Approximation Theorems of Mathematical Statistics , 1980 .

[15]  Masashi Sugiyama,et al.  A Least-squares Approach to Mutual Information Estimation with Application in Variable Selection , 2008 .

[16]  M. Kawanabe,et al.  Direct importance estimation for covariate shift adaptation , 2008 .

[17]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[18]  Martin J. Wainwright,et al.  Estimating divergence functionals and the likelihood ratio by penalized convex risk minimization , 2007, NIPS.

[19]  P. J. Green,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[20]  A. V. D. Vaart,et al.  Entropies and rates of convergence for maximum likelihood and Bayes estimation for mixtures of normal densities , 2001 .

[21]  Pierre Comon Independent component analysis - a new concept? signal processing , 1994 .

[22]  丸山 徹 Convex Analysisの二,三の進展について , 1977 .

[23]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[24]  M. V. Van Hulle,et al.  Edgeworth Approximation of Multivariate Differential Entropy , 2005, Neural Computation.