Multivariate Extension of Matrix-Based Rényi's $\alpha$α-Order Entropy Functional

The matrix-based Renyi's \alpha-order entropy functional was recently introduced using the normalized eigenspectrum of a Hermitian matrix of the projected data in a reproducing kernel Hilbert space (RKHS). However, the current theory in the matrix-based Renyi's \alpha-order entropy functional only defines the entropy of a single variable or mutual information between two random variables. In information theory and machine learning communities, one is also frequently interested in multivariate information quantities, such as the multivariate joint entropy and different interactive quantities among multiple variables. In this paper, we first define the matrix-based Renyi's \alpha-order joint entropy among multiple variables. We then show how this definition can ease the estimation of various information quantities that measure the interactions among multiple variables, such as interactive information and total correlation. We finally present an application to feature selection to show how our definition provides a simple yet powerful way to estimate a widely-acknowledged intractable quantity from data. A real example on hyperspectral image (HSI) band selection is also provided.

[1]  Hao Wu,et al.  An effective feature selection method for hyperspectral image classification based on genetic algorithm and support vector machine , 2011, Knowl. Based Syst..

[2]  Bernhard Schölkopf,et al.  The Randomized Dependence Coefficient , 2013, NIPS.

[3]  Gavin Brown,et al.  A New Perspective for Information Theoretic Feature Selection , 2009, AISTATS.

[4]  Jon Atli Benediktsson,et al.  Spectral–Spatial Classification of Hyperspectral Images With a Superpixel-Based Discriminative Sparse Model , 2015, IEEE Transactions on Geoscience and Remote Sensing.

[5]  Michael Satosi Watanabe,et al.  Information Theoretical Analysis of Multivariate Correlation , 1960, IBM J. Res. Dev..

[6]  James P. Crutchfield,et al.  Multivariate Dependence Beyond Shannon Information , 2016, Entropy.

[7]  Raymond W. Yeung,et al.  A new outlook of Shannon's information measures , 1991, IEEE Trans. Inf. Theory.

[8]  Licheng Jiao,et al.  Multiple Kernel Learning Based on Discriminative Kernel Clustering for Hyperspectral Band Selection , 2016, IEEE Transactions on Geoscience and Remote Sensing.

[9]  Peter E. Latham,et al.  Mutual Information , 2006 .

[10]  David A. Landgrebe,et al.  Signal Theory Methods in Multispectral Remote Sensing , 2003 .

[11]  Mathieu Fauvel,et al.  Fast Forward Feature Selection of Hyperspectral Images for Classification With Gaussian Mixture Models , 2015, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[12]  Benjamin Flecker,et al.  Multivariate information measures: an experimentalist's perspective , 2011, 1111.6857.

[13]  Rajendra Bhatia,et al.  Infinitely Divisible Matrices , 2006, Am. Math. Mon..

[14]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[16]  Benjamin Flecker,et al.  Synergy, redundancy, and multivariate information measures: an experimentalist’s perspective , 2014, Journal of Computational Neuroscience.

[17]  Bernhard Schölkopf,et al.  A Primer on Kernel Methods , 2004 .

[18]  Jose C. Principe,et al.  Information Theoretic Learning - Renyi's Entropy and Kernel Perspectives , 2010, Information Theoretic Learning.

[19]  LinLin Shen,et al.  Unsupervised Band Selection for Hyperspectral Imagery Classification Without Manual Band Removal , 2012, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[20]  G. Crooks On Measures of Entropy and Information , 2015 .

[21]  William J. McGill Multivariate information transmission , 1954, Trans. IRE Prof. Group Inf. Theory.

[22]  Lorenzo Bruzzone,et al.  Kernel-based methods for hyperspectral image classification , 2005, IEEE Transactions on Geoscience and Remote Sensing.

[23]  Chein-I Chang,et al.  Band Subset Selection for Hyperspectral Image Classification , 2018, Remote. Sens..

[24]  David D. Lewis,et al.  Feature Selection and Feature Extraction for Text Categorization , 1992, HLT.

[25]  Karl Pearson F.R.S. LIII. On lines and planes of closest fit to systems of points in space , 1901 .

[26]  Roberto Battiti,et al.  Using mutual information for selecting features in supervised neural net learning , 1994, IEEE Trans. Neural Networks.

[27]  A. J. Bell THE CO-INFORMATION LATTICE , 2003 .

[28]  F. Fleuret Fast Binary Feature Selection with Conditional Mutual Information , 2004, J. Mach. Learn. Res..

[29]  Gavin Brown,et al.  Conditional Likelihood Maximisation: A Unifying Framework for Information Theoretic Feature Selection , 2012, J. Mach. Learn. Res..

[30]  Randall D. Beer,et al.  Nonnegative Decomposition of Multivariate Information , 2010, ArXiv.

[31]  Jitendra Malik,et al.  Normalized Cuts and Image Segmentation , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[32]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[33]  Aram Galstyan,et al.  The Information Sieve , 2015, ICML.

[34]  Ivan Bratko,et al.  Quantifying and Visualizing Attribute Interactions , 2003, ArXiv.

[35]  Jon Atli Benediktsson,et al.  Advances in Hyperspectral Image Classification: Earth Monitoring with Statistical Learning Methods , 2013, IEEE Signal Processing Magazine.

[36]  Jose C. Principe,et al.  Measures of Entropy From Data Using Infinitely Divisible Kernels , 2012, IEEE Transactions on Information Theory.

[37]  Steve R. Gunn,et al.  Result Analysis of the NIPS 2003 Feature Selection Challenge , 2004, NIPS.

[38]  James Bailey,et al.  Reconsidering Mutual Information Based Feature Selection: A Statistical Significance View , 2014, AAAI.

[39]  Ivor W. Tsang,et al.  The pre-image problem in kernel methods , 2003, IEEE Transactions on Neural Networks.

[40]  James Bailey,et al.  Effective global approaches for mutual information based feature selection , 2014, KDD.

[41]  Serge Fehr,et al.  On quantum Rényi entropies: A new generalization and some properties , 2013, 1306.3142.

[42]  John E. Moody,et al.  Data Visualization and Feature Selection: New Algorithms for Nongaussian Data , 1999, NIPS.

[43]  P. J. Green,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[45]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .

[46]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..