F support vector machines

We introduce in this paper Fβ SVMs, a new parametrization of support vector machines. It allows to optimize a SVM in terms of Fβ, a classical information retrieval criterion, instead of the usual classification rate. Experiments illustrate the advantages of this approach with respect to the traditionnal 2- norm soft-margin SVM when precision and recall are of unequal importance. An automatic model selection procedure based on the generalization Fβ score is introduced. It relies on the results of Chapelle, Vapnik et al. (4) about the use of gradient-based techniques in SVM model selection. The derivatives of a Fβ loss function with respect to the hyperparameters C and the width σ of a gaussian kernel are formally defined. The model is then selected by performing a gradient descent of the Fβ loss function over the set of hyperparameters. Experiments on artificial and real-life data show the benefits of this method when the Fβ score is considered. I. INTRODUCTION Support Vector Machines (SVM) introduced by Vapnik (18) have been widely used in the field of pattern recognition for the last decade. The popularity of the method relies on its strong theoretical foundations as well as on its practical results. Performance of classifiers is usually assessed by means of classification error rate or by Information Retrieval (IR) measures such as precision, recall, Fβ , breakeven-point and ROC curves. Unfortunately, there is no direct connection between these IR criteria and the SVM hyperparameters: the regularization constant C and the kernel parameters. In this paper, we propose a novel method allowing the user to specify his requirement in terms of the Fβ criterion. First of all, the Fβ measure is reviewed as a user specification criterion in section II. A new SVM parametrization dealing with the β parameter is introduced in section III. Afterwards, a procedure for automatic model selection according to Fβ is proposed in section IV. This procedure is a gradient-based technique derived from the results of Chapelle, Vapnik et al. (4). Finally, experiments with artifical and real-life data are presented in section V. The two previous measures can be combined in a unique Fβ measure in which the paramater β specifies the relative importance of recall with respect to precision. Setting β equals to 0 would only consider precision whereas taking β = ∞ would only take recall into account. Moreover, precision and recall are of equal importance when using the F1 measure. The contingency matrix and estimations of precision, recall and Fβ are given hereafter. Target: +1 Target: -1 +1 True Pos. (# TP ) False Pos. (# FP ) -1 False Neg. (# FN ) True Neg. (# TN )

[1]  Carl D. Meyer,et al.  Matrix Analysis and Applied Linear Algebra , 2000 .

[2]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[3]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[4]  Thorsten Joachims,et al.  Making large-scale support vector machine learning practical , 1999 .

[5]  R. C. Williamson,et al.  Generalization Bounds via Eigenvalues of the Gram matrix , 1999 .

[6]  Sandro Ridella,et al.  Model selection in top quark tagging with a support vector classifier , 2004, 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541).

[7]  Nello Cristianini,et al.  Controlling the Sensitivity of Support Vector Machines , 1999 .

[8]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[9]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[10]  S. Sathiya Keerthi,et al.  A fast iterative nearest point algorithm for support vector machine classifier design , 2000, IEEE Trans. Neural Networks Learn. Syst..

[11]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[12]  Thorsten Joachims,et al.  Estimating the Generalization Performance of an SVM Efficiently , 2000, ICML.

[13]  R. Vanderbei LOQO:an interior point code for quadratic programming , 1999 .

[14]  S. Sathiya Keerthi,et al.  Evaluation of simple performance measures for tuning SVM hyperparameters , 2003, Neurocomputing.

[15]  Roger Fletcher,et al.  A Rapidly Convergent Descent Method for Minimization , 1963, Comput. J..

[16]  Nello Cristianini,et al.  An introduction to Support Vector Machines , 2000 .

[17]  Dustin Boswell,et al.  Introduction to Support Vector Machines , 2002 .

[18]  Stephen Kwek,et al.  Applying Support Vector Machines to Imbalanced Datasets , 2004, ECML.

[19]  Chih-Jen Lin,et al.  Radius Margin Bounds for Support Vector Machines with the RBF Kernel , 2002, Neural Computation.

[20]  V. Vapnik,et al.  Bounds on Error Expectation for Support Vector Machines , 2000, Neural Computation.

[21]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[22]  Sayan Mukherjee,et al.  Choosing Multiple Parameters for Support Vector Machines , 2002, Machine Learning.