One-Sided Fuzzy SVM Based on Sphere for Imbalanced Data Sets Learning

Learning from imbalanced data sets presents a new challenge to machine learning community, as traditional algorithms are biased to the majority classes and produce poor detection rate of the minority classes. This paper presents a one-sided fuzzy support vector machine algorithm based on sphere to improve the classification performance of the minority class. Firstly, the approach obtains the minimal hyper sphere of the majority class; secondly, it uses the center and radius of the hyper sphere to give the fuzzy membership of the majority instances, and thus effectively reduces the influence of majority noises and redundant instances in the classification process. Experiments show that our new approach improves not only the classification performance of the minority class more effectively, but also the classification performance of the whole data set comparing with other methods.

[1]  Gary M. Weiss Mining with rarity: a unifying framework , 2004, SKDD.

[2]  Daewon Lee,et al.  An improved cluster labeling method for support vector clustering , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Ljubo B. Vlacic,et al.  Learning and Soft Computing, Support Vector Machines, Neural Networks, and Fuzzy Logic Models, Vojislav Kecman; MIT Press, Cambridge, MA, 2001, ISBN 0-262-11255-8, 2001, pp 578 , 2002, Neurocomputing.

[4]  Stan Matwin,et al.  Addressing the Curse of Imbalanced Training Sets: One-Sided Selection , 1997, ICML.

[5]  H. P. Huang,et al.  Fuzzy Support Vector Machines for Pattern Recognition and Data Mining , 2002 .

[6]  Nitesh V. Chawla,et al.  Editorial: special issue on learning from imbalanced data sets , 2004, SKDD.

[7]  Vojislav Kecman,et al.  Learning and Soft Computing: Support Vector Machines, Neural Networks, and Fuzzy Logic Models , 2001 .

[8]  Sheng-De Wang,et al.  Fuzzy support vector machines , 2002, IEEE Trans. Neural Networks.

[9]  Robert P. W. Duin,et al.  Support vector domain description , 1999, Pattern Recognit. Lett..

[10]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[11]  Gustavo E. A. P. A. Batista,et al.  A study of the behavior of several methods for balancing machine learning training data , 2004, SKDD.

[12]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[13]  C. J. van Rijsbergen,et al.  Information Retrieval , 1979, Encyclopedia of GIS.

[14]  Lipo Wang Support vector machines : theory and applications , 2005 .

[15]  Lipo Wang,et al.  Data Mining With Computational Intelligence , 2006, IEEE Transactions on Neural Networks.