Identifying Feature Relevance Using a Random Forest

It is known that feature selection and feature relevance can benefit the performance and interpretation of machine learning algorithms. Here we consider feature selection within a Random Forest framework. A feature selection technique is introduced that combines hypothesis testing with an approximation to the expected performance of an irrelevant feature during Random Forest construction. It is demonstrated that the lack of implicit feature selection within Random Forest has an adverse effect on the accuracy and efficiency of the algorithm. It is also shown that irrelevant features can slow the rate of error convergence and a theoretical justification of this effect is given.

[1]  Ting Wang,et al.  Application of Breiman's Random Forest to Modeling Structure-Activity Relationships of Pharmaceutical Molecules , 2004, Multiple Classifier Systems.

[2]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[3]  Huan Liu,et al.  Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution , 2003, ICML.

[4]  Eugene Tuv,et al.  Tree-Based Ensembles with Dynamic Soft Feature Selection , 2006, Feature Extraction.

[5]  Yoav Freund,et al.  A Short Introduction to Boosting , 1999 .

[6]  J. Friedman Multivariate adaptive regression splines , 1990 .

[7]  Masoud Nikravesh,et al.  Feature Extraction - Foundations and Applications , 2006, Feature Extraction.

[8]  Mark A. Hall,et al.  Correlation-based Feature Selection for Discrete and Numeric Class Machine Learning , 1999, ICML.

[9]  Masoud Nikravesh,et al.  Feature Extraction: Foundations and Applications (Studies in Fuzziness and Soft Computing) , 2006 .

[10]  Tin Kam Ho,et al.  Nearest Neighbors in Random Subspaces , 1998, SSPR/SPR.

[11]  Steve R. Gunn,et al.  Ensemble Algorithms for Feature Selection , 2004, Deterministic and Statistical Methods in Machine Learning.

[12]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[13]  Thomas G. Dietterich An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization , 2000, Machine Learning.

[14]  Christopher K. I. Williams,et al.  Understanding Gaussian Process Regression Using the Equivalent Kernel , 2004, Deterministic and Statistical Methods in Machine Learning.

[15]  R. Casey,et al.  Advances in Pattern Recognition , 1971 .

[16]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[17]  Nitesh V. Chawla,et al.  Information Gain, Correlation and Support Vector Machines , 2006, Feature Extraction.

[18]  Chih-Jen Lin,et al.  Combining SVMs with Various Feature Selection Strategies , 2006, Feature Extraction.

[19]  G DietterichThomas An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees , 2000 .

[20]  Ron Kohavi,et al.  Irrelevant Features and the Subset Selection Problem , 1994, ICML.