Boosting SVM Classifiers with Logistic Regression

The support vector machine classifier is a linear maximum margin classifier. It performs very well in many classification applications. Although, it could be extended to nonlinear cases by exploiting the idea of kernel, it might still suffer from the heterogeneity in the training examples. Since there are very few theories in the literature to guide us on how to choose kernel functions, the selection of kernel is usually based on a try-and-error manner. When the training set are imbalanced, the data set might not be linear separable in the feature space defined by the chosen kernel. In this paper, we propose a hybrid method by integrating “small” support vector machine classifiers by logistic regression models. By appropriately partitioning the training set, this ensemble classifier can improve the performance of the SVM classifier trained with whole training examples at a time. With this method, we can not only avoid the difficulty of the heterogeneity, but also have probability outputs for all examples. Moreover, it is less ambiguous than the classifiers combined with voting schemes. From our simulation studies and some empirical results, we find that these kinds of hybrid SVM classifiers are robust in the following sense: (1) It improves the performance (prediction accuracy) of the SVM classifier trained with whole training set, when there are some kind of heterogeneity in the training examples; and (2) it is at least as good as the original SVM classifier, when there is actually no heterogeneity presented in the training examples. We also apply this hybrid method to multi-class problems by replacing binary logistic regression models with polychotomous logistic regression models. Moreover, the polychotomous regression model can be constructed from individual binary logistic regression models.

[1]  Paul S. Bradley,et al.  Refining Initial Points for K-Means Clustering , 1998, ICML.

[2]  P. McCullagh,et al.  Generalized Linear Models , 1972, Predictive Analytics.

[3]  Robert Tibshirani,et al.  Classification by Pairwise Coupling , 1997, NIPS.

[4]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[5]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[6]  R. Gray,et al.  Calculation of polychotomous logistic regression parameters using individualized regressions , 1984 .

[7]  N. Japkowicz Learning from Imbalanced Data Sets: A Comparison of Various Strategies * , 2000 .

[8]  Kagan Tumer,et al.  Error Correlation and Error Reduction in Ensemble Classifiers , 1996, Connect. Sci..

[9]  Nathalie Japkowicz,et al.  The Class Imbalance Problem: Significance and Strategies , 2000 .

[10]  David Firth,et al.  Bias reduction, the Jeffreys prior and GLIM , 1992 .

[11]  John A. Nelder,et al.  Generalized linear models. 2nd ed. , 1993 .

[12]  D. Firth Bias reduction of maximum likelihood estimates , 1993 .

[13]  Foster Provost,et al.  The effect of class distribution on classifier learning , 2001 .

[14]  Ji Zhu,et al.  Kernel Logistic Regression and the Import Vector Machine , 2001, NIPS.

[15]  M. Schemper,et al.  A solution to the problem of separation in logistic regression , 2002, Statistics in medicine.

[16]  Stan Matwin,et al.  Learning When Negative Examples Abound , 1997, ECML.

[17]  Eric Bauer,et al.  An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants , 1999, Machine Learning.

[18]  D. Opitz,et al.  Popular Ensemble Methods: An Empirical Study , 1999, J. Artif. Intell. Res..

[19]  Rong Zhang,et al.  A large scale clustering scheme for kernel K-Means , 2002, Object recognition supported by user interaction for service robots.

[20]  Tom Fawcett,et al.  Robust Classification Systems for Imprecise Environments , 1998, AAAI/IAAI.