Mutually Informed Correlation Coefficient (MICC) - a New Filter Based Feature Selection Method

Feature selection (FS) is a well-explored domain of data pre-processing and information theory. It is the process of selecting important features from a high-dimensional feature vectors possibly having many redundant and/or non-informative features. In this paper, we have proposed a score-based filter FS approach known as Mutually Informed Correlation Coefficient (MICC) by combining two popular statistical dependence measures namely Mutual Information (MI) and Pearson Correlation Coefficient (PCC). We have evaluated MICC on different variations of Local Binary Pattern (LBP) based feature vectors used for classifying the components of handwritten document images as text or non-text. We have compared the results with some popular filter methods namely Gini Index, T-test, ReliefF, along with MI and PCC individually. The results and corresponding comparisons show that our proposed method not only does FS efficiently but also enhances the recognition accuracy of the said classification problem. The code of the proposed algorithm can be found in this link: MICC.

[1]  Igor Kononenko,et al.  Estimating Attributes: Analysis and Extensions of RELIEF , 1994, ECML.

[2]  Subhadip Basu,et al.  Suppression of non-text components in handwritten document images , 2011, 2011 International Conference on Image Information Processing.

[3]  Mita Nasipuri,et al.  Text and non-text separation in offline document images: a survey , 2018, International Journal on Document Analysis and Recognition (IJDAR).

[4]  Luiz Eduardo Soares de Oliveira,et al.  Writer verification using texture-based features , 2011, International Journal on Document Analysis and Recognition (IJDAR).

[5]  Ujjwal Maulik,et al.  Recursive Memetic Algorithm for gene selection in microarray data , 2019, Expert Syst. Appl..

[6]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[7]  Hanqing Lu,et al.  Face detection using improved LBP under Bayesian framework , 2004, Third International Conference on Image and Graphics (ICIG'04).

[8]  F. Fleuret Fast Binary Feature Selection with Conditional Mutual Information , 2004, J. Mach. Learn. Res..

[9]  Deng Cai,et al.  Laplacian Score for Feature Selection , 2005, NIPS.

[10]  Hugues Bersini,et al.  A Survey on Filter Techniques for Feature Selection in Gene Expression Microarray Analysis , 2012, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[11]  Ram Sarkar,et al.  Genetic algorithm based cancerous gene identification from microarray data using ensemble of filter methods , 2018, Medical & Biological Engineering & Computing.

[12]  Huan Liu,et al.  Efficient Feature Selection via Analysis of Relevance and Redundancy , 2004, J. Mach. Learn. Res..

[13]  Haibin Zhu,et al.  An Adaptive Fuzzy kNN Text Classifier Based on Gini Index Weight , 2006, 11th IEEE Symposium on Computers and Communications (ISCC'06).

[14]  Wlodzislaw Duch,et al.  Feature Selection for High-Dimensional Data - A Pearson Redundancy Based Filter , 2008, Computer Recognition Systems 2.

[15]  Tommy W. S. Chow,et al.  Estimating optimal feature subsets using efficient estimation of high-dimensional mutual information , 2005, IEEE Transactions on Neural Networks.

[16]  Showmik Bhowmik,et al.  Text and non-text recognition using modified HOG descriptor , 2017, 2017 IEEE Calcutta Conference (CALCON).

[17]  P. S. Hiremath,et al.  Script identification in a handwritten document image using texture features , 2010, 2010 IEEE 2nd International Advance Computing Conference (IACC).

[18]  Jan Hauke,et al.  Comparison of Values of Pearson's and Spearman's Correlation Coefficients on the Same Sets of Data , 2011 .

[19]  Martin Krzywinski,et al.  The curse(s) of dimensionality , 2018, Nature Methods.

[20]  Roberto Battiti,et al.  Using mutual information for selecting features in supervised neural net learning , 1994, IEEE Trans. Neural Networks.

[21]  Cherukuri Aswani Kumar,et al.  Intrusion detection model using fusion of chi-square feature selection and multi class SVM , 2017, J. King Saud Univ. Comput. Inf. Sci..

[22]  Showmik Bhowmik,et al.  Text/Non-Text Separation from Handwritten Document Images Using LBP Based Features: An Empirical Study , 2018, J. Imaging.

[23]  Ram Sarkar,et al.  Page-level handwritten script identification using modified log-Gabor filter based features , 2015, 2015 IEEE 2nd International Conference on Recent Trends in Information Systems (ReTIS).

[24]  Chong-Ho Choi,et al.  Input feature selection for classification problems , 2002, IEEE Trans. Neural Networks.

[25]  Mita Nasipuri,et al.  Text and Non-text Separation in Handwritten Document Images Using Local Binary Pattern Operator , 2017 .

[26]  Rohini K. Srihari,et al.  Feature selection for text categorization on imbalanced data , 2004, SKDD.