MSc Project Feature Selection using Information Theoretic Techniques

This document presents a investigation into 3 different areas of feature selection, using information theoretic methods. The first area of research is an investigation into the selection of the first feature in common feature selection algorithms. This step is often overlooked in the construction of feature selection algorithms, with the assumption that the most informative feature is the best first choice. This can be proven to be untrue, and so an investigation into how to select a better suited feature forms the first part of the research. New methods for selecting the first feature are proposed and empirically tested to see if they offer an improvement over the standard method. The second area of research is an investigation into applying the Renyi extension to information theory to standard feature selection techniques. This requires the development of a Renyi mutual information measure, and two different measures are proposed. The Renyi extension provides a positive real parameter, α, which can be varied. The new Renyi feature selection techniques are empirically tested, varying the measure and value of α used. The third area of research is an investigation into a different method of estimating the Renyi entropy of a variable, by using a function based upon the length of the minimal spanning tree of the variable. This enables a high dimensional estimate of the entropy to be constructed in O(s log s) time and can be used to implement the Max Dependency criterion. This research investigates the use of this estimator as a feature selection criterion and proposes a new genetic algorithm based method for selecting features, in contrast to the traditional forward search used in other algorithms.

[1]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[2]  Miguel Cazorla,et al.  A Novel Information Theory Method for Filter Feature Selection , 2007, MICAI.

[3]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .

[4]  Godfried T. Toussaint,et al.  Note on optimal selection of independent binary-valued features for pattern recognition (Corresp.) , 1971, IEEE Trans. Inf. Theory.

[5]  William M. Campbell,et al.  Mutual Information in Learning Feature Transformations , 2000, ICML.

[6]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[7]  Huzefa Firoz Neemuchwala,et al.  Entropic graphs for image registration. , 2005 .

[8]  Alfred O. Hero,et al.  Asymptotic theory of greedy approximations to minimal k-point random graphs , 1999, IEEE Trans. Inf. Theory.

[9]  J. Kruskal On the shortest spanning subtree of a graph and the traveling salesman problem , 1956 .

[10]  Huan Liu,et al.  Efficient Feature Selection via Analysis of Relevance and Redundancy , 2004, J. Mach. Learn. Res..

[11]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  F. Fleuret Fast Binary Feature Selection with Conditional Mutual Information , 2004, J. Mach. Learn. Res..

[13]  Alfréd Rényi,et al.  Probability Theory , 1970 .

[14]  Amos Golan,et al.  Comparison of maximum entropy and higher-order entropy estimators , 2002 .

[15]  Daphne Koller,et al.  Toward Optimal Feature Selection , 1996, ICML.

[16]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[17]  Gianluca Bontempi,et al.  On the Use of Variable Complementarity for Feature Selection in Cancer Classification , 2006, EvoWorkshops.

[18]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[19]  M. Birkner,et al.  Blow-up of semilinear PDE's at the critical dimension. A probabilistic approach , 2002 .

[20]  Alfred O. Hero,et al.  Image registration methods in high‐dimensional space , 2006, Int. J. Imaging Syst. Technol..

[21]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .