Feature selection with dynamic mutual information

Feature selection plays an important role in data mining and pattern recognition, especially for large scale data. During past years, various metrics have been proposed to measure the relevance between different features. Since mutual information is nonlinear and can effectively represent the dependencies of features, it is one of widely used measurements in feature selection. Just owing to these, many promising feature selection algorithms based on mutual information with different parameters have been developed. In this paper, at first a general criterion function about mutual information in feature selector is introduced, which can bring most information measurements in previous algorithms together. In traditional selectors, mutual information is estimated on the whole sampling space. This, however, cannot exactly represent the relevance among features. To cope with this problem, the second purpose of this paper is to propose a new feature selection algorithm based on dynamic mutual information, which is only estimated on unlabeled instances. To verify the effectiveness of our method, several experiments are carried out on sixteen UCI datasets using four typical classifiers. The experimental results indicate that our algorithm achieved better results than other methods in most cases.

[1]  Gabriele Steidl,et al.  Combined SVM-Based Feature Selection and Classification , 2005, Machine Learning.

[2]  Tommy W. S. Chow,et al.  Effective feature selection scheme using mutual information , 2005, Neurocomputing.

[3]  Chong-Ho Choi,et al.  Input Feature Selection by Mutual Information Based on Parzen Window , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Xiaojin Zhu,et al.  Semi-Supervised Learning Literature Survey , 2005 .

[5]  Mohamed A. Deriche,et al.  A new mutual information based measure for feature selection , 2003, Intell. Data Anal..

[6]  Edoardo Amaldi,et al.  On the Approximability of Minimizing Nonzero Variables or Unsatisfied Relations in Linear Systems , 1998, Theor. Comput. Sci..

[7]  David W. Aha,et al.  Instance-Based Learning Algorithms , 1991, Machine Learning.

[8]  I K Fodor,et al.  A Survey of Dimension Reduction Techniques , 2002 .

[9]  Huan Liu,et al.  Efficient Feature Selection via Analysis of Relevance and Redundancy , 2004, J. Mach. Learn. Res..

[10]  Josef Kittler,et al.  Fast branch & bound algorithms for optimal feature selection , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Michael I. Jordan,et al.  Feature selection for high-dimensional genomic microarray data , 2001, ICML.

[12]  Adam C. Winstanley,et al.  Invariant optimal feature selection: A distance discriminant and feature ranking based solution , 2008, Pattern Recognit..

[13]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[14]  Huan Liu,et al.  Toward integrating feature selection algorithms for classification and clustering , 2005, IEEE Transactions on Knowledge and Data Engineering.

[15]  Larry A. Rendell,et al.  A Practical Approach to Feature Selection , 1992, ML.

[16]  Christos Faloutsos,et al.  Fast feature selection using fractal dimension , 2010, J. Inf. Data Manag..

[17]  Lyle H. Ungar,et al.  Machine Learning manuscript No. (will be inserted by the editor) Active Learning for Logistic Regression: , 2007 .

[18]  David A. Bell,et al.  A Formalism for Relevance and Its Application in Feature Subset Selection , 2000, Machine Learning.

[19]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[20]  F. Fleuret Fast Binary Feature Selection with Conditional Mutual Information , 2004, J. Mach. Learn. Res..

[21]  Usama M. Fayyad,et al.  Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning , 1993, IJCAI.

[22]  Richard Bellman,et al.  Adaptive Control Processes: A Guided Tour , 1961, The Mathematical Gazette.

[23]  Geoff Holmes,et al.  Benchmarking Attribute Selection Techniques for Discrete Class Data Mining , 2003, IEEE Trans. Knowl. Data Eng..

[24]  Gang Wang,et al.  Feature selection with conditional mutual information maximin in text categorization , 2004, CIKM '04.

[25]  Huan Liu,et al.  Consistency-based search in feature selection , 2003, Artif. Intell..

[26]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Roberto Battiti,et al.  Using mutual information for selecting features in supervised neural net learning , 1994, IEEE Trans. Neural Networks.

[28]  Laurent Younes,et al.  A Stochastic Algorithm for Feature Selection in Pattern Recognition , 2007, J. Mach. Learn. Res..

[29]  Anirban Dasgupta,et al.  Feature selection methods for text classification , 2007, KDD '07.

[30]  Pat Langley,et al.  Estimating Continuous Distributions in Bayesian Classifiers , 1995, UAI.

[31]  Marko Robnik-Sikonja,et al.  Theoretical and Empirical Analysis of ReliefF and RReliefF , 2003, Machine Learning.

[32]  Feiping Nie,et al.  A unified framework for semi-supervised dimensionality reduction , 2008, Pattern Recognit..

[33]  Carla E. Brodley,et al.  Unsupervised Feature Selection Applied to Content-Based Retrieval of Lung Images , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[34]  Huan Liu,et al.  Feature Selection for Classification , 1997, Intell. Data Anal..

[35]  Huan Liu,et al.  A selective sampling approach to active feature selection , 2004, Artif. Intell..

[36]  Ron Kohavi,et al.  Irrelevant Features and the Subset Selection Problem , 1994, ICML.

[37]  Xiaoming Xu,et al.  A hybrid genetic algorithm for feature selection wrapper based on mutual information , 2007, Pattern Recognit. Lett..

[38]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[39]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[40]  Michael Lindenbaum,et al.  Selective Sampling for Nearest Neighbor Classifiers , 1999, Machine Learning.

[41]  Carla E. Brodley,et al.  Feature Selection for Unsupervised Learning , 2004, J. Mach. Learn. Res..

[42]  Igor Kononenko,et al.  Estimating Attributes: Analysis and Extensions of RELIEF , 1994, ECML.

[43]  William W. Cohen Fast Effective Rule Induction , 1995, ICML.

[44]  Z. Pawlak Rough Sets: Theoretical Aspects of Reasoning about Data , 1991 .

[45]  Pat Langley,et al.  Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[46]  George Forman,et al.  An Extensive Empirical Study of Feature Selection Metrics for Text Classification , 2003, J. Mach. Learn. Res..

[47]  Salvatore J. Stolfo,et al.  Adaptive Intrusion Detection: A Data Mining Approach , 2000, Artificial Intelligence Review.

[48]  Juyang Weng,et al.  Efficient content-based image retrieval using automatic feature selection , 1995, Proceedings of International Symposium on Computer Vision - ISCV.

[49]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[50]  Daoqiang Zhang,et al.  Constraint Score: A new filter method for feature selection with pairwise constraints , 2008, Pattern Recognit..

[51]  Huan Liu,et al.  Spectral feature selection for supervised and unsupervised learning , 2007, ICML '07.

[52]  Christos Faloutsos,et al.  A fast and effective method to find correlations among attributes in databases , 2007, Data Mining and Knowledge Discovery.

[53]  Salim Hariri,et al.  A new dependency and correlation analysis for features , 2005, IEEE Transactions on Knowledge and Data Engineering.

[54]  Josef Kittler,et al.  Pattern recognition : a statistical approach , 1982 .

[55]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[56]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery in Databases , 1996, AI Mag..

[57]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[58]  Lech Polkowski,et al.  Rough Sets in Knowledge Discovery 2 , 1998 .

[59]  Enric Plaza,et al.  Proceedings of the 11th European Conference on Machine Learning , 2000 .

[60]  S. Durga Bhavani,et al.  Feature selection using correlation fractal dimension: Issues and applications in binary classification problems , 2008, Appl. Soft Comput..

[61]  C. A. Murthy,et al.  Unsupervised Feature Selection Using Feature Similarity , 2002, IEEE Trans. Pattern Anal. Mach. Intell..