Information Theoretic Feature Crediting in Multiclass Support Vector Machines

Identifying relevant features for a classification task is an important issue in machine learning. In this paper, we present a feature crediting scheme for multiclass pattern recognition tasks, that utilizes the ability of Support Vector Machines to generalize well in high dimensional feature spaces. Support Vector learning identifies a small subset of training data relevant for the classification task. They primarily tackle the binary classification problem. This scheme uses relevant examples to identify relevant features for multi-class classification. We present, and employ for this purpose, an informationtheoretic measure of classifier performance. This measure addresses the key issue of average rate of information being delivered by the classifier. It provides immunity to sampling bias in the data and sensitivity to pattern of errors made by the classifier. Empirical results on a number of datasets suggest efficient applicability to data with a very large number of features.

[1]  Claude E. Shannon,et al.  A Mathematical Theory of Communications , 1948 .

[2]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[3]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[4]  Sholom M. Weiss,et al.  Optimizing the Predictive Value of Diagnostic Decision Rules , 1987, AAAI.

[5]  Thomas G. Dietterich,et al.  Learning with Many Irrelevant Features , 1991, AAAI.

[6]  Larry A. Rendell,et al.  The Feature Selection Problem: Traditional Methods and a New Algorithm , 1992, AAAI.

[7]  Igor Kononenko,et al.  Estimating Attributes: Analysis and Extensions of RELIEF , 1994, ECML.

[8]  M. Sanderson The Reuters collection , 1994 .

[9]  Ron Kohavi,et al.  Irrelevant Features and the Subset Selection Problem , 1994, ICML.

[10]  David J. Spiegelhalter,et al.  Machine Learning, Neural and Statistical Classification , 2009 .

[11]  Joydeep Ghosh,et al.  Linear feature extractors based on mutual information , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[12]  Daphne Koller,et al.  Toward Optimal Feature Selection , 1996, ICML.

[13]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[14]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[15]  Tom Fawcett,et al.  Analysis and Visualization of Classifier Performance: Comparison under Imprecise Class and Cost Distributions , 1997, KDD.

[16]  Huan Liu,et al.  Feature Selection for Classification , 1997, Intell. Data Anal..

[17]  Wilfried Brauer,et al.  Feature Selection by Means of a Feature Weighting Approach , 1997 .

[18]  Susan T. Dumais,et al.  Inductive learning algorithms and representations for text categorization , 1998, CIKM '98.

[19]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[20]  John Shawe-Taylor,et al.  Generalization Performance of Support Vector Machines and Other Pattern Classifiers , 1999 .

[21]  J. C. BurgesChristopher A Tutorial on Support Vector Machines for Pattern Recognition , 1998 .

[22]  Ethem Alpaydin,et al.  Support Vector Machine for Multiclass Classification , 1998 .

[23]  John C. Platt Using Analytic QP and Sparseness to Speed Training of Support Vector Machines , 1998, NIPS.

[24]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[25]  Harris Drucker,et al.  Support vector machines for spam categorization , 1999, IEEE Trans. Neural Networks.

[26]  Jason Weston,et al.  Support vector machines for multi-class pattern recognition , 1999, ESANN.

[27]  David R. Musicant,et al.  Successive overrelaxation for support vector machines , 1999, IEEE Trans. Neural Networks.

[28]  Sayan Mukherjee,et al.  Feature Selection for SVMs , 2000, NIPS.

[29]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[30]  Antanas Verikas,et al.  Feature selection with neural networks , 2002, Pattern Recognit. Lett..