A new histogram-based estimation technique of entropy and mutual information using mean squared error minimization

Mutual Information (MI) has extensively been used as a measure of similarity or dependence between random variables (or parameters) in different signal and image processing applications. However, MI estimation techniques are known to exhibit a large bias, a high Mean Squared Error (MSE), and can computationally be very costly. In order to overcome these drawbacks, we propose here a novel fast and low MSE histogram-based estimation technique for the computation of entropy and the mutual information. By minimizing the MSE, the estimation avoids the error accumulation problem of traditional methods. We derive an expression for the optimal number of bins to estimate the MI for both continuous and discrete random variables. Experimental results from a speech recognition problem and a computer aided diagnosis problem show the power of the proposed approach in estimating the optimal number of selected features with enhanced classification results compared to existing approaches.

[1]  Christian Jutten,et al.  A general approach for mutual information minimization and its application to blind source separation , 2005, Signal Process..

[2]  D. W. Scott,et al.  Multivariate Density Estimation, Theory, Practice and Visualization , 1992 .

[3]  Max A. Viergever,et al.  Mutual-information-based registration of medical images: a survey , 2003, IEEE Transactions on Medical Imaging.

[4]  Anil K. Jain,et al.  Statistical Pattern Recognition: A Review , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Igor Vajda,et al.  Estimation of the Information by an Adaptive Partitioning of the Observation Space , 1999, IEEE Trans. Inf. Theory.

[6]  Jean-Philippe Thiran,et al.  Information Theoretic Feature Extraction for Audio-Visual Speech Recognition , 2009, IEEE Transactions on Signal Processing.

[7]  Rachid Harba,et al.  Low bias histogram-based estimation of mutual information for feature selection , 2012, Pattern Recognit. Lett..

[8]  Ivan Kojadinovic,et al.  Relevance measures for subset variable selection in regression problems based on k , 2005, Comput. Stat. Data Anal..

[9]  Driss Aboutajdine,et al.  Textural feature selection by joint mutual information based on Gaussian mixture model for multispectral image classification , 2010, Pattern Recognit. Lett..

[10]  David Pearce,et al.  The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions , 2000, INTERSPEECH.

[11]  T. Durrani,et al.  Estimation of mutual information using copula density function , 2011 .

[12]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[13]  R. Moddemeijer On estimation of entropy and mutual information of continuous distributions , 1989 .

[14]  Chee Keong Kwoh,et al.  A Feature Subset Selection Method Based On High-Dimensional Mutual Information , 2011, Entropy.

[15]  Chong-Ho Choi,et al.  Input Feature Selection by Mutual Information Based on Parzen Window , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[16]  Lejla Batina,et al.  Mutual Information Analysis: a Comprehensive Study , 2011, Journal of Cryptology.

[17]  A. Kraskov,et al.  Estimating mutual information. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[18]  Steve Young,et al.  The HTK book , 1995 .

[19]  José Carlos Príncipe,et al.  Advanced search algorithms for information-theoretic learning with kernel-based estimators , 2004, IEEE Transactions on Neural Networks.

[20]  Blaise Hanczar,et al.  Feature construction from synergic pairs to improve microarray-based classification , 2007, Bioinform..

[21]  John E. Moody,et al.  Data Visualization and Feature Selection: New Algorithms for Nongaussian Data , 1999, NIPS.

[22]  Leo Joskowicz,et al.  A curvelet-based patient-specific prior for accurate multi-modal brain image rigid registration , 2011, Medical Image Anal..

[23]  Stefano Panzeri,et al.  Correcting for the sampling bias problem in spike train information measures. , 2007, Journal of neurophysiology.

[24]  Herbert A. Sturges,et al.  The Choice of a Class Interval , 1926 .

[25]  Norbert Marwan,et al.  Mutual information estimation for irregularly sampled time series , 2012 .

[26]  D. Freedman,et al.  On the histogram as a density estimator:L2 theory , 1981 .

[27]  Hong-Goo Kang,et al.  Estimating redundancy information of selected features in multi-dimensional pattern classification , 2011, Pattern Recognit. Lett..

[28]  Gavin Brown,et al.  A New Perspective for Information Theoretic Feature Selection , 2009, AISTATS.

[29]  Lars Kai Hansen,et al.  Model sparsity and brain pattern interpretation of classification models in neuroimaging , 2012, Pattern Recognit..

[30]  Michel Verleysen,et al.  Resampling methods for parameter-free and robust feature selection with mutual information , 2007, Neurocomputing.