Detecting relevant variables and interactions in supervised classification

The widely used Support Vector Machine (SVM) method has shown to yield good results in Supervised Classification problems. When the interpretability is an important issue, then classification methods such as Classification and Regression Trees (CART) might be more attractive, since they are designed to detect the important predictor variables and, for each predictor variable, the critical values which are most relevant for classification. However, when interactions between variables strongly affect the class membership, CART may yield misleading information. Extending previous work of the authors, in this paper an SVM-based method is introduced. The numerical experiments reported show that our method is competitive against SVM and CART in terms of misclassification rates, and, at the same time, is able to detect critical values and variables interactions which are relevant for classification.

[1]  Gerald W. Kimble,et al.  Information and Computer Science , 1975 .

[2]  R. Gomory,et al.  A Linear Programming Approach to the Cutting-Stock Problem , 1961 .

[3]  Robert Tibshirani,et al.  Classification by Pairwise Coupling , 1997, NIPS.

[4]  Bart Baesens,et al.  Using Neural Network Rule Extraction and Decision Tables for Credit - Risk Evaluation , 2003, Manag. Sci..

[5]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[6]  Heikki Mannila,et al.  Principles of Data Mining , 2001, Undergraduate Topics in Computer Science.

[7]  Bart Baesens,et al.  Forecasting and analyzing insurance companies' ratings , 2007 .

[8]  D. J. Newman,et al.  UCI Repository of Machine Learning Database , 1998 .

[9]  Bernhard Schölkopf,et al.  Semiparametric Support Vector and Linear Programming Machines , 1998, NIPS.

[10]  Marko Robnik-Sikonja,et al.  Explaining Classifications For Individual Instances , 2008, IEEE Transactions on Knowledge and Data Engineering.

[11]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[12]  Jonathan N. Crook,et al.  Credit Scoring and Its Applications , 2002, SIAM monographs on mathematical modeling and computation.

[13]  Emilio Carrizosa,et al.  Binarized Support Vector Machines , 2010, INFORMS J. Comput..

[14]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[15]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[16]  O. Mangasarian Linear and Nonlinear Separation of Patterns by Linear Programming , 1965 .

[17]  Michael V. Mannino,et al.  The cost-minimizing inverse classification problem: a genetic algorithm approach , 2000, Decis. Support Syst..

[18]  D. Slonim From patterns to pathways: gene expression data analysis comes of age , 2002, Nature Genetics.

[19]  Ralf Herbrich,et al.  Learning Kernel Classifiers: Theory and Algorithms , 2001 .

[20]  Bart Baesens,et al.  Decompositional Rule Extraction from Support Vector Machines by Active Learning , 2009, IEEE Transactions on Knowledge and Data Engineering.

[21]  Bart Baesens,et al.  Comprehensible Credit Scoring Models Using Rule Extraction from Support Vector Machines , 2007, Eur. J. Oper. Res..

[22]  J. Weston,et al.  Support vector density estimation , 1999 .

[23]  Glenn Fung,et al.  A Feature Selection Newton Method for Support Vector Machine Classification , 2004, Comput. Optim. Appl..

[24]  Christopher K. I. Williams Learning Kernel Classifiers , 2003 .

[25]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[26]  Gunnar Rätsch,et al.  Soft Margins for AdaBoost , 2001, Machine Learning.

[27]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[28]  Joachim Diederich,et al.  Eclectic Rule-Extraction from Support Vector Machines , 2005 .