The Effect of Attribute Scaling on the Performance of Support Vector Machines

This paper presents some empirical results showing that simple attribute scaling in the data preprocessing stage can improve the performance of linear binary classifiers In particular, a class specific scaling method that utilises information about the class distribution of the training sample can significantly improve classification accuracy This form of scaling can boost the performance of a simple centroid classifier to similar levels of accuracy as the more complex, and computationally expensive, support vector machine and regression classifiers Further, when SVMs are used, scaled data produces better results, for smaller amounts of training data, and with smaller regularisation constant values, than unscaled data.

[1]  Foster Provost,et al.  The effect of class distribution on classifier learning , 2001 .

[2]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[3]  Alexander J. Smola,et al.  Learning with Kernels: support vector machines, regularization, optimization, and beyond , 2001, Adaptive computation and machine learning series.

[4]  Gerard Salton,et al.  The SMART Retrieval System—Experiments in Automatic Document Processing , 1971 .

[5]  Tomaso A. Poggio,et al.  Regularization Theory and Neural Networks Architectures , 1995, Neural Computation.

[6]  Andrew P. Bradley,et al.  The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..

[7]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[8]  Michael J. A. Berry,et al.  Data mining techniques - for marketing, sales, and customer support , 1997, Wiley computer publishing.

[9]  Foster Provost,et al.  The effect of class distribution on classifier learning: an empirical study , 2001 .

[10]  Adam Kowalczyk,et al.  Exploring Fringe Settings of SVMs for Classification , 2003, PKDD.

[11]  R. Centor Signal Detectability , 1991, Medical decision making : an international journal of the Society for Medical Decision Making.

[12]  Tom Fawcett,et al.  ROC Graphs: Notes and Practical Considerations for Data Mining Researchers , 2003 .

[13]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[14]  Dorian Pyle,et al.  Data Preparation for Data Mining , 1999 .

[15]  Hendrik Blockeel,et al.  Knowledge Discovery in Databases: PKDD 2003 , 2003, Lecture Notes in Computer Science.

[16]  J. J. Rocchio,et al.  Relevance feedback in information retrieval , 1971 .

[17]  D. Bamber The area above the ordinal dominance graph and the area below the receiver operating characteristic graph , 1975 .

[18]  G. Wahba,et al.  A Correspondence Between Bayesian Estimation on Stochastic Processes and Smoothing by Splines , 1970 .

[19]  David J. Hand,et al.  A Simple Generalisation of the Area Under the ROC Curve for Multiple Class Classification Problems , 2001, Machine Learning.