A Novel Classifier - Weighted Features Cost-Sensitive SVM

Learning effective and efficient classifiers for imbalanced data is one of ten challenge problems in data mining research. Studying classifiers for imbalanced data is a popular area in machine learning and data mining, which also has great significance in many areas, such as cancer diagnose, credit card fraud detection and intrusion detection. The study for imbalanced data classification can be divided into three major parts: resampling data, change internal algorithm and costsensitive learning. Weighted feature methods can also enhance the accuracy of classification. In this paper, we intend to use a novel method combining weighted feature and cost-sensitive learning to deal with imbalanced data. We add weights to features, which causes that the position of each instance in space changes. Our goal is to increase the separation between classes by enlarging the space around the separating boundary surface through weighted features. Since the margin is enlarged, the chance that the instances in minority class are classified into majority class by mistake will be lower. Weighted Features cost-sensitive SVM (WF-CSSVM) performs well in both accuracy and cost. UCI datasets are utilized in experiment part and most of them can be classified perfectly. Accuracy, G-mean and ROC are employed as evaluation metrics.

[1]  James Bailey,et al.  Feature Weighted SVMs Using Receiver Operating Characteristics , 2009, SDM.

[2]  Nathalie Japkowicz,et al.  Boosting Support Vector Machines for Imbalanced Data Sets , 2008, ISMIS.

[3]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[4]  Adam Kowalczyk,et al.  Extreme re-balancing for SVMs: a case study , 2004, SKDD.

[5]  Panos M. Pardalos,et al.  Feature selection based on meta-heuristics for biomedicine , 2014, Optim. Methods Softw..

[6]  Nello Cristianini,et al.  Controlling the Sensitivity of Support Vector Machines , 1999 .

[7]  Eric Horvitz,et al.  Considering Cost Asymmetry in Learning Classifiers , 2006, J. Mach. Learn. Res..

[8]  Lei Wang,et al.  AdaBoost with SVM-based component classifiers , 2008, Eng. Appl. Artif. Intell..

[9]  Gustavo E. A. P. A. Batista,et al.  A study of the behavior of several methods for balancing machine learning training data , 2004, SKDD.

[10]  Yong Zhang,et al.  A Cost-Sensitive Ensemble Method for Class-Imbalanced Datasets , 2013 .

[11]  Jane Labadin,et al.  Feature selection based on mutual information , 2015, 2015 9th International Conference on IT in Asia (CITA).

[12]  Xin Yao,et al.  MWMOTE--Majority Weighted Minority Oversampling Technique for Imbalanced Data Set Learning , 2014 .

[13]  Hsuan-Tien Lin,et al.  A simple methodology for soft cost-sensitive classification , 2012, KDD.

[14]  Taghi M. Khoshgoftaar,et al.  Experimental perspectives on learning from imbalanced data , 2007, ICML '07.

[15]  Chung-Ho Hsieh,et al.  Applying Under-Sampling Techniques and Cost-Sensitive Learning Methods on Risk Assessment of Breast Cancer , 2015, Journal of Medical Systems.

[16]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[17]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Stephen Kwek,et al.  Applying Support Vector Machines to Imbalanced Datasets , 2004, ECML.

[19]  Yanqing Zhang,et al.  SVMs Modeling for Highly Imbalanced Classification , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[20]  Hugues Bersini,et al.  A Survey on Filter Techniques for Feature Selection in Gene Expression Microarray Analysis , 2012, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[21]  Yi Lin,et al.  Support Vector Machines for Classification in Nonstandard Situations , 2002, Machine Learning.

[22]  Xue-wen Chen,et al.  Pruning support vectors for imbalanced data classification , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[23]  Dazhe Zhao,et al.  An Optimized Cost-Sensitive SVM for Imbalanced Data Learning , 2013, PAKDD.

[24]  Zhi-Hua Zhou,et al.  Exploratory Under-Sampling for Class-Imbalance Learning , 2006, Sixth International Conference on Data Mining (ICDM'06).

[25]  Kuo-Chen Chou,et al.  Prediction of Protein Domain with mRMR Feature Selection and Analysis , 2012, PloS one.

[26]  Stan Matwin,et al.  Addressing the Curse of Imbalanced Training Sets: One-Sided Selection , 1997, ICML.

[27]  Peter E. Hart,et al.  The condensed nearest neighbor rule (Corresp.) , 1968, IEEE Trans. Inf. Theory.

[28]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[29]  Xindong Wu,et al.  10 Challenging Problems in Data Mining Research , 2006, Int. J. Inf. Technol. Decis. Mak..

[30]  Bao Cui-mei Classification of weighted support vector machine based on active learning , 2009 .

[31]  Wu Tie-jun,et al.  Weighted Support Vector Machine Based Classification Algorithm for Uneven Class Size Problems , 2003 .

[32]  Nitesh V. Chawla,et al.  Editorial: special issue on learning from imbalanced data sets , 2004, SKDD.

[33]  Chao Xu,et al.  IFME: information filtering by multiple examples with under-sampling in a digital library environment , 2013, JCDL '13.

[34]  C. G. Hilborn,et al.  The Condensed Nearest Neighbor Rule , 1967 .

[35]  Richard G. Baraniuk,et al.  Controlling False Alarms With Support Vector Machines , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[36]  Stefan Lessmann,et al.  Solving Imbalanced Classification Problems with Support Vector Machines , 2004, IC-AI.

[37]  Rong Yan,et al.  On predicting rare classes with SVM ensembles in scene classification , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[38]  Edward Y. Chang,et al.  KBA: kernel boundary alignment considering imbalanced data distribution , 2005, IEEE Transactions on Knowledge and Data Engineering.

[39]  O. Chapelle Multi-Class Feature Selection with Support Vector Machines , 2008 .

[40]  Yuan-Hai Shao,et al.  An efficient weighted Lagrangian twin support vector machine for imbalanced data classification , 2014, Pattern Recognit..