Strongly agree or strongly disagree?: Rating features in Support Vector Machines

In linear classifiers, such as the Support Vector Machine (SVM), a score is associated with each feature and objects are assigned to classes based on the linear combination of the scores and the values of the features. Inspired by discrete psychometric scales, which measure the extent to which a factor is in agreement with a statement, we propose the Discrete Level Support Vector Machine (DILSVM) where the feature scores can only take on a discrete number of values, defined by the so-called feature rating levels. The DILSVM classifier benefits from interpretability and it has visual appeal, since it can be represented as a collection of Likert scales, one for each feature, where we rate the level of agreement with the positive class. To construct the DILSVM classifier, we propose a Mixed Integer Linear Programming approach, as well as a collection of strategies to reduce computational cost. Our numerical experiments show that the three-point and the five-point DILSVM classifiers have comparable accuracy to the SVM with a substantial gain in interpretability and visual appeal, but also in sparsity, thanks to the appropriate choice of the feature rating levels.

[1]  Emilio Carrizosa,et al.  Supervised classification and mathematical optimization , 2013, Comput. Oper. Res..

[2]  Yann Chevaleyre,et al.  Experimental analysis of new algorithms for learning ternary classifiers , 2015, The 2015 IEEE RIVF International Conference on Computing & Communication Technologies - Research, Innovation, and Vision for Future (RIVF).

[3]  Yiming Yang,et al.  From Lasso regression to Feature vector machine , 2005, NIPS.

[4]  C. Rudin,et al.  Building Interpretable Classifiers with Rules using Bayesian Analysis , 2012 .

[5]  Carlo Vercellis,et al.  Multivariate classification trees based on minimum features discrete support vector machines , 2003 .

[6]  Bart BaesensRudy Using Neural Network Rule Extraction and Decision Tables for Credit-Risk Evaluation , 2003 .

[7]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[8]  Philip S. Yu,et al.  Top 10 algorithms in data mining , 2007, Knowledge and Information Systems.

[9]  Adam N. Letchford,et al.  Non-convex mixed-integer nonlinear programming: A survey , 2012 .

[10]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[11]  Cynthia Rudin,et al.  Supersparse linear integer models for optimized medical scoring systems , 2015, Machine Learning.

[12]  K. Lempert,et al.  CONDENSED 1,3,5-TRIAZEPINES - IV THE SYNTHESIS OF 2,3-DIHYDRO-1H-IMIDAZO-[1,2-a] [1,3,5] BENZOTRIAZEPINES , 1983 .

[13]  Martha J. Radford,et al.  Validation of Clinical Classification Schemes for Predicting Stroke: Results From the National Registry of Atrial Fibrillation , 2001 .

[14]  R. Likert “Technique for the Measurement of Attitudes, A” , 2022, The SAGE Encyclopedia of Research Design.

[15]  Emilio Carrizosa,et al.  Binarized Support Vector Machines , 2010, INFORMS J. Comput..

[16]  Emilio Carrizosa,et al.  A nested heuristic for parameter tuning in Support Vector Machines , 2014, Comput. Oper. Res..

[17]  Praveen Pathak,et al.  Detecting Management Fraud in Public Companies , 2010, Manag. Sci..

[18]  Heikki Mannila,et al.  Principles of Data Mining , 2001, Undergraduate Topics in Computer Science.

[19]  Foster J. Provost,et al.  Active Feature-Value Acquisition , 2009, Manag. Sci..

[20]  Volker Roth,et al.  The generalized LASSO , 2004, IEEE Transactions on Neural Networks.

[21]  Peter D. Turney Cost-Sensitive Classification: Empirical Evaluation of a Hybrid Genetic Decision Tree Induction Algorithm , 1994, J. Artif. Intell. Res..

[22]  Foster J. Provost,et al.  Explaining Data-Driven Document Classifications , 2013, MIS Q..

[23]  Ralph L. Keeney,et al.  Selecting Attributes to Measure the Achievement of Objectives , 2005, Oper. Res..

[24]  Olivia R. Liu Sheng,et al.  When is the Right Time to Refresh Knowledge Discovered From Data? , 2013, Oper. Res..

[25]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[26]  J. Paul Brooks,et al.  Support Vector Machines with the Ramp Loss and the Hard Margin Loss , 2011, Oper. Res..

[27]  Alex Alves Freitas,et al.  Comprehensible classification models: a position paper , 2014, SKDD.

[28]  E. Carrizosa,et al.  Clustering Categories in Support Vector Machines , 2017 .

[29]  Richard Weber,et al.  A wrapper method for feature selection using Support Vector Machines , 2009, Inf. Sci..

[30]  Mario Marchand,et al.  On Learning Perceptrons with Binary Weights , 1993, Neural Computation.

[31]  Ya-Ju Fan,et al.  Novel Optimization Models for Abnormal Brain Activity Classification , 2008, Oper. Res..

[32]  Bart Baesens,et al.  Comprehensible Credit Scoring Models Using Rule Extraction from Support Vector Machines , 2007, Eur. J. Oper. Res..

[33]  Huan Liu,et al.  Discretization: An Enabling Technique , 2002, Data Mining and Knowledge Discovery.

[34]  Cynthia Rudin,et al.  Interpretable classifiers using rules and Bayesian analysis: Building a better stroke prediction model , 2015, ArXiv.

[35]  Cynthia Rudin,et al.  ORC: Ordered Rules for ClassificationA Discrete Optimization Approach to Associative Classification , 2012 .

[36]  Santosh S. Vempala,et al.  Algorithmic Prediction of Health-Care Costs , 2008, Oper. Res..

[37]  Gregory Y H Lip,et al.  Refining clinical risk stratification for predicting stroke and thromboembolism in atrial fibrillation using a novel risk factor-based approach: the euro heart survey on atrial fibrillation. , 2010, Chest.

[38]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[39]  Emilio Carrizosa,et al.  Multi-group support vector machines with measurement costs: A biobjective approach , 2008, Discret. Appl. Math..

[40]  Dolores Romero Morales,et al.  Forecasting cancellation rates for services booking revenue management using data mining , 2010, Eur. J. Oper. Res..

[41]  Ludwig Lausser,et al.  Multi-Objective Parameter Selection for Classifiers , 2012 .

[42]  Richard Weber,et al.  Advanced conjoint analysis using feature selection via support vector machines , 2015, Eur. J. Oper. Res..

[43]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[44]  John N. Hooker,et al.  Combining Equity and Utilitarianism in a Mathematical Programming Model , 2012, Manag. Sci..

[45]  Yann Chevaleyre,et al.  Rounding Methods for Discrete Linear Classification , 2013, ICML.

[46]  Carlo Vercellis,et al.  Discrete support vector decision trees via tabu search , 2004, Comput. Stat. Data Anal..

[47]  Prabhakar Raghavan,et al.  Randomized rounding: A technique for provably good algorithms and algorithmic proofs , 1985, Comb..

[48]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[49]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[50]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[51]  Hon-Kwong Lui,et al.  Machine Learning for Direct Marketing Response Models: Bayesian Networks with Evolutionary Programming , 2006, Manag. Sci..