Principled Non-Linear Feature Selection

Recent non-linear feature selection approaches employing greedy optimisation of Centred Kernel Target Alignment(KTA) exhibit strong results in terms of generalisation accuracy and sparsity. However, they are computationally prohibitive for large datasets. We propose randSel, a randomised feature selection algorithm, with attractive scaling properties. Our theoretical analysis of randSel provides strong probabilistic guarantees for correct identification of relevant features. RandSel's characteristics make it an ideal candidate for identifying informative learned representations. We've conducted experimentation to establish the performance of this approach, and present encouraging results, including a 3rd position result in the recent ICML black box learning challenge as well as competitive results for signal peptide prediction, an important problem in bioinformatics.

[1]  Pavel Pudil,et al.  Fast dependency-aware feature selection in very-high-dimensional pattern recognition , 2011, 2011 IEEE International Conference on Systems, Man, and Cybernetics.

[2]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[3]  Neil D. Lawrence,et al.  Advances in Neural Information Processing Systems 14 , 2002 .

[4]  Yoshua Bengio,et al.  Challenges in representation learning: A report on three machine learning contests , 2013, Neural Networks.

[5]  Bernhard Schölkopf,et al.  Measuring Statistical Dependence with Hilbert-Schmidt Norms , 2005, ALT.

[6]  Sayan Mukherjee,et al.  Feature Selection for SVMs , 2000, NIPS.

[7]  S. Brunak,et al.  SignalP 4.0: discriminating signal peptides from transmembrane regions , 2011, Nature Methods.

[8]  Mehryar Mohri,et al.  Algorithms for Learning Kernels Based on Centered Alignment , 2012, J. Mach. Learn. Res..

[9]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[10]  N. Cristianini,et al.  On Kernel-Target Alignment , 2001, NIPS.

[11]  Le Song,et al.  Feature Selection via Dependence Maximization , 2012, J. Mach. Learn. Res..

[12]  Julien Mairal,et al.  Proximal Methods for Sparse Hierarchical Dictionary Learning , 2010, ICML.

[13]  Andrew Y. Ng,et al.  Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[14]  John Shawe-Taylor,et al.  Multiple Kernel Learning on the Limit Order Book , 2010, WAPA.

[15]  Dieter Jahn,et al.  PrediSi: prediction of signal peptides and their cleavage positions , 2004, Nucleic Acids Res..

[16]  Jiquan Ngiam,et al.  Sparse Filtering , 2011, NIPS.

[17]  N. Meinshausen,et al.  Stability selection , 2008, 0809.2932.