论文信息 - Gradient boosted feature selection

Gradient boosted feature selection

A feature selection algorithm should ideally satisfy four conditions: reliably extract relevant features; be able to identify non-linear feature interactions; scale linearly with the number of features and dimensions; allow the incorporation of known sparsity structure. In this work we propose a novel feature selection algorithm, Gradient Boosted Feature Selection (GBFS), which satisfies all four of these requirements. The algorithm is flexible, scalable, and surprisingly straight-forward to implement as it is based on a modification of Gradient Boosted Trees. We evaluate GBFS on several real world data sets and show that it matches or outperforms other state of the art feature selection algorithms. Yet it scales to larger data set sizes and naturally allows for domain-specific side information.

[1] R. Tibshirani. Regression Shrinkage and Selection via the Lasso , 1996 .

[2] U. Alon,et al. Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[3] J. Friedman. Greedy function approximation: A gradient boosting machine. , 2001 .

[4] B. E. Eckbo,et al. Appendix , 1826, Epilepsy Research.

[5] Anthony Widjaja,et al. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[6] James Theiler,et al. Grafting: Fast, Incremental Feature Selection by Gradient Descent in Function Space , 2003, J. Mach. Learn. Res..

[7] Isabelle Guyon,et al. An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[8] Eric R. Ziegel,et al. The Elements of Statistical Learning , 2003, Technometrics.

[9] Volker Roth,et al. The generalized LASSO , 2004, IEEE Transactions on Neural Networks.

[10] Fuhui Long,et al. Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11] Bill Triggs,et al. Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[12] Jian Huang,et al. BMC Bioinformatics BioMed Central Methodology article Supervised group Lasso with applications to microarray data , 2007 .

[13] Honglak Lee,et al. Efficient L1 Regularized Logistic Regression , 2006, AAAI.

[14] P. Sebastiani,et al. Airway epithelial gene expression in the diagnostic evaluation of smokers with suspect lung cancer , 2007, Nature Medicine.

[15] Mee Young Park,et al. L1‐regularization path algorithm for generalized linear models , 2007 .

[16] Pedro Larrañaga,et al. A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[17] Volker Roth,et al. The Group-Lasso for generalized linear models: uniqueness of solutions and efficient algorithms , 2008, ICML '08.

[18] Tong Zhang,et al. Multi-stage Convex Relaxation for Learning with Sparse Regularization , 2008, NIPS.

[19] Yoram Singer,et al. Efficient projections onto the l1-ball for learning in high dimensions , 2008, ICML '08.

[20] P. Bühlmann,et al. The group lasso for logistic regression , 2008 .

[21] George C. Runger,et al. Feature Selection with Ensembles, Artificial Variables, and Redundancy Elimination , 2009, J. Mach. Learn. Res..

[22] Feng Pan,et al. Feature selection for ranking using boosted trees , 2009, CIKM.

[23] Junzhou Huang,et al. Learning with structured sparsity , 2009, ICML '09.