Twin Bounded Weighted Relaxed Support Vector Machines

Data distribution has an important role in classification. The problem of imbalanced data has occurred when the distribution of one class, which usually attends more interest, is negligible compared with other class. Furthermore, by the existence of outliers and noise, the classification of these data confronts more challenges. Despite these challenges, doing fast classification with good performance is desired. One of the successful classifier methods for dealing with imbalanced data and outliers is weighted relaxed support vector machines (WRSVMs). In this paper, the improved twin version of this classifier, which is called twin-bounded weighted relaxed support vector machines, is introduced to confront the mentioned challenges; besides, it performs in a significant fast manner and it is more accurate in most cases. This method benefits from the fast classification manner of twin-bounded support vector machines and outlier robustness capability of WRSVM in the imbalanced problems. The experimentally, the proposed method is compared with the WRSVM and other standard SVM-based methods on the public benchmark datasets. The results confirm the efficiency of the proposed method.

[1]  Yiming Ma,et al.  Improving an Association Rule Based Classifier , 2000, PKDD.

[2]  Nathalie Japkowicz,et al.  A Mixture-of-Experts Framework for Learning from Imbalanced Data Sets , 2001, IDA.

[3]  Stan Matwin,et al.  Machine Learning for the Detection of Oil Spills in Satellite Radar Images , 1998, Machine Learning.

[4]  David A. Cieslak,et al.  Combating imbalance in network intrusion datasets , 2006, 2006 IEEE International Conference on Granular Computing.

[5]  Zhoujun Li,et al.  Applying adaptive over-sampling technique based on data density and cost-sensitive SVM to imbalanced learning , 2012, The 2012 International Joint Conference on Neural Networks (IJCNN).

[6]  Anirban Mukherjee,et al.  Nonparallel plane proximal classifier , 2009, Signal Process..

[7]  Xiaohua Hu,et al.  MAPLSC: A novel multi-class classifier for medical diagnosis , 2011, Int. J. Data Min. Bioinform..

[8]  W. Art Chaovalitwongse,et al.  Relaxing support vectors for classification , 2014, Ann. Oper. Res..

[9]  Nitesh V. Chawla,et al.  Data Mining for Imbalanced Datasets: An Overview , 2005, The Data Mining and Knowledge Discovery Handbook.

[10]  R. Barandelaa,et al.  Strategies for learning in class imbalance problems , 2003, Pattern Recognit..

[11]  Ying He,et al.  MSMOTE: Improving Classification Performance When Training Data is Imbalanced , 2009, 2009 Second International Workshop on Computer Science and Engineering.

[12]  George H. John Robust Decision Trees: Removing Outliers from Databases , 1995, KDD.

[13]  Vasile Palade,et al.  FSVM-CIL: Fuzzy Support Vector Machines for Class Imbalance Learning , 2010, IEEE Transactions on Fuzzy Systems.

[14]  Olvi L. Mangasarian,et al.  Multisurface proximal support vector machine classification via generalized eigenvalues , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  M. Friedman The Use of Ranks to Avoid the Assumption of Normality Implicit in the Analysis of Variance , 1937 .

[16]  Jason Weston,et al.  Adaptive Margin Support Vector Machines , 2000 .

[17]  Nathalie Japkowicz,et al.  The class imbalance problem: A systematic study , 2002, Intell. Data Anal..

[18]  Mykola Pechenizkiy,et al.  Class Noise and Supervised Learning in Medical Domains: The Effect of Feature Extraction , 2006, 19th IEEE Symposium on Computer-Based Medical Systems (CBMS'06).

[19]  R. Venkatesh Babu,et al.  No-reference image quality assessment using modified extreme learning machine classifier , 2009, Appl. Soft Comput..

[20]  Carla E. Brodley,et al.  Identifying and Eliminating Mislabeled Training Instances , 1996, AAAI/IAAI, Vol. 1.

[21]  David L. Olson,et al.  A support vector machine (SVM) approach to imbalanced datasets of customer responses: comparison with other customer response models , 2012, Service Business.

[22]  Wei-Zhen Lu,et al.  Ground-level ozone prediction by support vector machine approach with a cost-sensitive classification scheme. , 2008, The Science of the total environment.

[23]  Wenjie Hu,et al.  Robust support vector machine with bullet hole image classification , 2002 .

[24]  Madan Gopal,et al.  Least squares twin support vector machines for pattern classification , 2009, Expert Syst. Appl..

[25]  Ali Al-Shahib,et al.  Feature Selection and the Class Imbalance Problem in Predicting Protein Function from Sequence , 2005, Applied bioinformatics.

[26]  Joarder Kamruzzaman,et al.  z-SVM: An SVM for Improved Classification of Imbalanced Data , 2006, Australian Conference on Artificial Intelligence.

[27]  Donghai Guan,et al.  Identifying mislabeled training data with the aid of unlabeled data , 2011, Applied Intelligence.

[28]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[29]  Longbing Cao,et al.  Effective detection of sophisticated online banking fraud on extremely imbalanced data , 2012, World Wide Web.

[30]  Xin Yao,et al.  Diversity analysis on imbalanced data sets by using ensemble models , 2009, 2009 IEEE Symposium on Computational Intelligence and Data Mining.

[31]  Tom Fawcett,et al.  "In vivo" spam filtering: a challenge problem for KDD , 2003, SKDD.

[32]  C. M. Bishop,et al.  Improvements on Twin Support Vector Machines , 2011 .

[33]  Peter L. Bartlett,et al.  Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[34]  Sam Kwong,et al.  A noise-detection based AdaBoost algorithm for mislabeled data , 2012, Pattern Recognit..

[35]  Onur Seref,et al.  Weighted relaxed support vector machines , 2014, Annals of Operations Research.

[36]  Pavel Brazdil,et al.  Cost-Sensitive Decision Trees Applied to Medical Data , 2007, DaWaK.

[37]  Sheng-De Wang,et al.  Fuzzy support vector machines , 2002, IEEE Trans. Neural Networks.

[38]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[39]  Hewijin Christine Jiau,et al.  Evaluation of neural networks and data mining methods on a credit assessment task for class imbalance problem , 2006 .

[40]  Jiong Li,et al.  Combating class imbalance problem in semi-supervised defect detection , 2011, 2011 International Conference on Computational Problem-Solving (ICCP).

[41]  Xinjun Peng,et al.  A nu-twin support vector machine (nu-TSVM) classifier and its geometric algorithms , 2010, Inf. Sci..

[42]  Choh-Man Teng,et al.  Correcting Noisy Data , 1999, ICML.

[43]  Yi Lin,et al.  Support Vector Machines for Classification in Nonstandard Situations , 2002, Machine Learning.

[44]  Rosa Maria Valdovinos,et al.  New Applications of Ensembles of Classifiers , 2003, Pattern Analysis & Applications.

[45]  Nello Cristianini,et al.  Controlling the Sensitivity of Support Vector Machines , 1999 .

[46]  Marzuki Khalid,et al.  A Hybrid Artificial Neural Network-Naive Bayes for solving imbalanced dataset problems in semiconductor manufacturing test process , 2011, 2011 11th International Conference on Hybrid Intelligent Systems (HIS).

[47]  Yong Shi,et al.  Structural twin support vector machine for classification , 2013, Knowl. Based Syst..

[48]  Talayeh Razzaghi,et al.  Cost-Sensitive Learning-based Methods for Imbalanced Classification Problems with Applications , 2014 .

[49]  Jacek M. Zurada,et al.  Training neural network classifiers for medical decision making: The effects of imbalanced datasets on classification performance , 2008, Neural Networks.

[50]  Anneleen Van Assche,et al.  Ensemble Methods for Noise Elimination in Classification Problems , 2003, Multiple Classifier Systems.

[51]  Nitesh V. Chawla,et al.  Editorial: special issue on learning from imbalanced data sets , 2004, SKDD.

[52]  Reshma Khemchandani,et al.  Twin Support Vector Machines for Pattern Classification , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[53]  David A. Cieslak,et al.  Start Globally, Optimize Locally, Predict Globally: Improving Performance on Imbalanced Data , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[54]  Foster J. Provost,et al.  Learning When Training Data are Costly: The Effect of Class Distribution on Tree Induction , 2003, J. Artif. Intell. Res..

[55]  Nitesh V. Chawla,et al.  SMOTEBoost: Improving Prediction of the Minority Class in Boosting , 2003, PKDD.

[56]  Peter L. Bartlett,et al.  Functional Gradient Techniques for Combining Hypotheses , 2000 .

[57]  Szymon Wilk,et al.  Selective Pre-processing of Imbalanced Data for Improving Classification Performance , 2008, DaWaK.

[58]  Nada Lavrac,et al.  Ensemble-based noise detection: noise ranking and visual performance evaluation , 2012, Data Mining and Knowledge Discovery.

[59]  Edward Y. Chang,et al.  KBA: kernel boundary alignment considering imbalanced data distribution , 2005, IEEE Transactions on Knowledge and Data Engineering.

[60]  Yaping Lin,et al.  Synthetic minority oversampling technique for multiclass imbalance problems , 2017, Pattern Recognit..

[61]  Andrew P. Bradley,et al.  The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..

[62]  Vipin Kumar,et al.  Evaluating boosting algorithms to classify rare classes: comparison and improvements , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[63]  Andrew K. C. Wong,et al.  Classification of Imbalanced Data: a Review , 2009, Int. J. Pattern Recognit. Artif. Intell..

[64]  Xijiong Xie Regularized multi-view least squares twin support vector machines , 2017, Applied Intelligence.

[65]  Szymon Wilk,et al.  Learning from Imbalanced Data in Presence of Noisy and Borderline Examples , 2010, RSCTC.

[66]  Taghi M. Khoshgoftaar,et al.  RUSBoost: A Hybrid Approach to Alleviating Class Imbalance , 2010, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[67]  Kai Ming Ting,et al.  A Comparative Study of Cost-Sensitive Boosting Algorithms , 2000, ICML.

[68]  Taghi M. Khoshgoftaar,et al.  Boosted Noise Filters for Identifying Mislabeled Data , 2005 .

[69]  Xiao-Jun Wu,et al.  A new fuzzy twin support vector machine for pattern classification , 2017, International Journal of Machine Learning and Cybernetics.

[70]  David P. Williams,et al.  Mine Classification With Imbalanced Data , 2009, IEEE Geoscience and Remote Sensing Letters.

[71]  Su-Yun Huang,et al.  Model selection for support vector machines via uniform design , 2007, Comput. Stat. Data Anal..

[72]  R. Suganya,et al.  Data Mining Concepts and Techniques , 2010 .

[73]  David A. Cieslak,et al.  A Robust Decision Tree Algorithm for Imbalanced Data Sets , 2010, SDM.

[74]  Yu Zong,et al.  A Novel Classification Algorithm to Noise Data , 2012, ICSI.

[75]  Carla E. Brodley,et al.  Identifying Mislabeled Training Data , 1999, J. Artif. Intell. Res..

[76]  Stan Matwin,et al.  Learning When Negative Examples Abound , 1997, ECML.

[77]  Lei Li,et al.  A new algorithm for imbalanced datasets in presence of outliers and noise , 2012, 2012 8th International Conference on Natural Computation.

[78]  Edward Y. Chang,et al.  Multi-camera spatio-temporal fusion and biased sequence-data learning for security surveillance , 2003, MULTIMEDIA '03.

[79]  Nathalie Japkowicz,et al.  The Class Imbalance Problem: Significance and Strategies , 2000 .

[80]  Zhen Ren,et al.  Power quality disturbance identification using wavelet packet energy entropy and weighted support vector machines , 2008, Expert Syst. Appl..

[81]  Chih-Cheng Chang,et al.  Robust 1-Norm Soft Margin Smooth Support Vector Machine , 2010, IDEAL.

[82]  Gustavo E. A. P. A. Batista,et al.  A study of the behavior of several methods for balancing machine learning training data , 2004, SKDD.

[83]  Gary M. Weiss Mining with rarity: a unifying framework , 2004, SKDD.

[84]  Yi-Hung Liu,et al.  Total margin based adaptive fuzzy support vector machines for multiview face recognition , 2005, 2005 IEEE International Conference on Systems, Man and Cybernetics.