A simple and efficient method for variable ranking according to their usefulness for learning

The selection of a subset of input variables is often based on the previous construction of a ranking to order the variables according to a given criterion of relevancy. The objective is then to linearize the search, estimating the quality of subsets containing the topmost ranked variables. An algorithm devised to rank input variables according to their usefulness in the context of a learning task is presented. This algorithm is the result of a combination of simple and classical techniques, like correlation and orthogonalization, which allow the construction of a fast algorithm that also deals explicitly with redundancy. Additionally, the proposed ranker is endowed with a simple polynomial expansion of the input variables to cope with nonlinear problems. The comparison with some state-of-the-art rankers showed that this combination of simple components is able to yield high-quality rankings of input variables. The experimental validation is made on a wide range of artificial data sets and the quality of the rankings is assessed using a ROC-inspired setting, to avoid biased estimations due to any particular learning algorithm.

[1]  Sayan Mukherjee,et al.  Feature Selection for SVMs , 2000, NIPS.

[2]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[3]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[4]  Elena Marchiori,et al.  Ensemble Feature Ranking , 2004, PKDD.

[5]  Juan José del Coz,et al.  Trait Selection for Assessing Beef Meat Quality Using Non-linear SVM , 2004, NIPS.

[6]  Sheng Chen,et al.  Orthogonal least squares methods and their application to non-linear system identification , 1989 .

[7]  Alain Rakotomamonjy,et al.  Variable Selection Using SVM-based Criteria , 2003, J. Mach. Learn. Res..

[8]  José Ramón Quevedo,et al.  Discovering Relevancies in Very Difficult Regression Problems: Applications to Sensory Data Analysis , 2004, ECAI.

[9]  Larry A. Rendell,et al.  A Practical Approach to Feature Selection , 1992, ML.

[10]  Tom Fawcett,et al.  ROC Graphs: Notes and Practical Considerations for Researchers , 2007 .

[11]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[12]  Gérard Dreyfus,et al.  Ranking a Random Feature for Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[13]  Huan Liu,et al.  Efficient Feature Selection via Analysis of Relevance and Redundancy , 2004, J. Mach. Learn. Res..

[14]  Isabelle Guyon,et al.  Multivariate Non-Linear Feature Selection with Kernel Multiplicative Updates and Gram-Schmidt Relief , 2003 .

[15]  Bernard De Baets,et al.  Feature subset selection for splice site prediction , 2002, ECCB.

[16]  Lorne D. Johnson,et al.  Maximizing equity market sector predictability in a Bayesian time-varying parameter model , 2008, Comput. Stat. Data Anal..

[17]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[18]  José Ramón Quevedo,et al.  Analyzing Sensory Data Using Non-linear Preference Learning with Feature Subset Selection , 2004, ECML.

[19]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[20]  Å. Björck Solving linear least squares problems by Gram-Schmidt orthogonalization , 1967 .

[21]  José Ramón Quevedo,et al.  Feature subset selection for learning preferences: a case study , 2004, ICML.