Multiclass Capped ℓp-Norm SVM for Robust Classifications

Support vector machine (SVM) model is one of most successful machine learning methods and has been successfully applied to solve numerous real-world application. Because the SVM methods use the hinge loss or squared hinge loss functions for classifications, they usually outperform other classification approaches, e.g. the least square loss function based methods. However, like most supervised learning algorithms, they learn classifiers based on the labeled data in training set without specific strategy to deal with the noise data. In many real-world applications, we often have data outliers in train set, which could misguide the classifiers learning, such that the classification performance is suboptimal. To address this problem, we proposed a novel capped p-norm SVM classification model by utilizing the capped p-norm based hinge loss in the objective which can deal with both light and heavy outliers. We utilize the new formulation to naturally build the multiclass capped p-norm SVM. More importantly, we derive a novel optimization algorithms to efficiently minimize the capped p-norm based objectives, and also rigorously prove the convergence of proposed algorithms. We present experimental results showing that employing the new capped p-norm SVM method can consistently improve the classification performance, especially in the cases when the data noise level increases. Introduction As one of the most fundamental problems in data mining, classification has numerous applications in different areas such as information retrieval (Cao et al. 2009; Sriram et al. 2010), computer vision (Krizhevsky, Sutskever, and Hinton 2012), bioinformatics (Brown et al. 2000), medical image computing (Chen, Daponte, and Fox 1989), natural language processing (Wang and Mannin 2012) etc. Given training data from multiple classes, the classification task is to learn the classifiers in a supervised way and find the correct class to which a test example belongs. Many classification models have been proposed in literature. Among them, the support vector machine (SVM) (Boser, Guyon, ∗Corresponding author. This work was partially supported by the following grants: NSF-IIS 1302675, NSF-IIS 1344152, NSFDBI 1356628, NSF-IIS 1619308, NSF-IIS 1633753, NIH R01 AG049371. Copyright c © 2017, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. and Vapnik 1992) is one of the most successful classification models and has been applied to solve various applications. One of main reasons for SVM models (Mangasarian 2002; Keerthi and DeCoste 2005; Lin, Weng, and Keerthi 2008; Chang, Hsieh, and Lin 2008; Hsieh et al. 2008) to outperform other classification methods is their unilateral loss function, e.g. hinge loss or squared hinge loss. The unilateral loss is more suitable for classification tasks than the bilateral loss, which has been used in regression models. However, like most supervised learning algorithms, existing SVM models learn classifiers based on the labeled data in training set without considering the noise problem. In many real-world applications, we often have data outliers in train set, e.g. the incorrectly labeled data, the data significantly different to other data in the same class, etc. These data outliers could mislead the classifiers training task, such that the learned classifiers are not optimal and the classification performance is reduced. Thus, the robust classification model is desired to deal with the classification tasks in real-world applications. Although sparse learning models have been applied to SVM methods in literature, such as 1SVM (Bradley and Mangasarian 1998), 2,1-SVM (Cai et al. 2011), Hybrid Huberized SVM (Wang, Zhu, and Zou 2007), Sparse SVM (Cotter, Shalev-Shwartz, and Srebro 2013), these methods mainly focus on selecting significant features or reducing the number of support vectors to improve the classification tasks. None of them are specifically designed to deal with the data outliers for robust classifications. To address this challenging problem, we propose a novel capped p-norm SVM classification model by utilizing the capped p-norm based loss function. Different to the hinge loss and squared hinge loss used in existing SVM methods, which are not robust to data outliers, the capped pnorm is theoretically robust to both light and heavy outliers. Because the data outliers usually have large residues, the capped norm can help the model eliminate these outliers in model training process. In term of multiclass classifications, the existing research often uses the one-vs-one or one-vsrest strategies to utilize the binary SVM classifier for solving multiclass classification tasks, but the label ambiguity problem has been well-known for such situations. Thus, we also introduce the new formulation for multiclass SVM model, which can directly solve the multiclass classification problem. Based on the new multiclass SVM formulation, we can Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17)

[1]  Enhong Chen,et al.  Context-aware query classification , 2009, SIGIR.

[2]  Hakan Ferhatosmanoglu,et al.  Short text classification in twitter to improve information filtering , 2010, SIGIR.

[3]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[4]  Jieping Ye,et al.  Multi-stage multi-task feature learning , 2012, J. Mach. Learn. Res..

[5]  Li Wang,et al.  Hybrid huberized support vector machines for microarray classification , 2007, ICML '07.

[6]  Chih-Jen Lin,et al.  Trust Region Newton Method for Logistic Regression , 2008, J. Mach. Learn. Res..

[7]  Olvi L. Mangasarian,et al.  A finite newton method for classification , 2002, Optim. Methods Softw..

[8]  A. Asuncion,et al.  UCI Machine Learning Repository, University of California, Irvine, School of Information and Computer Sciences , 2007 .

[9]  Tong Zhang,et al.  Multi-stage Convex Relaxation for Learning with Sparse Regularization , 2008, NIPS.

[10]  S. P. Fodor DNA SEQUENCING: Massively Parallel Genomics , 1997, Science.

[11]  Johan A. K. Suykens,et al.  Least Squares Support Vector Machines , 2002 .

[12]  S. Sathiya Keerthi,et al.  A Modified Finite Newton Method for Fast Solution of Large Scale Linear SVMs , 2005, J. Mach. Learn. Res..

[13]  Feiping Nie,et al.  Robust Capped Norm Nonnegative Matrix Factorization: Capped Norm NMF , 2015, CIKM.

[14]  Feiping Nie,et al.  Robust Dictionary Learning with Capped l1-Norm , 2015, IJCAI.

[15]  Jieping Ye,et al.  Robust principal component analysis via capped norms , 2013, KDD.

[16]  M. Fox,et al.  Fractal feature analysis and classification in medical imaging. , 1989, IEEE transactions on medical imaging.

[17]  J. Welsh,et al.  Molecular classification of human carcinomas by use of gene expression signatures. , 2001, Cancer research.

[18]  Feiping Nie,et al.  Discriminative Least Squares Regression for Multiclass Classification and Feature Selection , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[19]  Tong Zhang,et al.  Analysis of Multi-stage Convex Relaxation for Sparse Regularization , 2010, J. Mach. Learn. Res..

[20]  Nathan Srebro,et al.  Learning Optimally Sparse Support Vector Machines , 2013, ICML.

[21]  D Haussler,et al.  Knowledge-based analysis of microarray gene expression data by using support vector machines. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[22]  Chih-Jen Lin,et al.  A dual coordinate descent method for large-scale linear SVM , 2008, ICML '08.

[23]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[24]  E. Lander,et al.  Gene expression correlates of clinical prostate cancer behavior. , 2002, Cancer cell.

[25]  Christopher D. Manning,et al.  Baselines and Bigrams: Simple, Good Sentiment and Topic Classification , 2012, ACL.

[26]  Feiping Nie,et al.  Multi-Class L2,1-Norm Support Vector Machine , 2011, 2011 IEEE 11th International Conference on Data Mining.

[27]  Paul S. Bradley,et al.  Feature Selection via Concave Minimization and Support Vector Machines , 1998, ICML.

[28]  E. Lander,et al.  Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[29]  Feiping Nie,et al.  Robust and Effective Metric Learning Using Capped Trace Norm: Metric Learning via Capped Trace Norm , 2016, KDD.

[30]  Cho-Jui Hsieh,et al.  Coordinate Descent Method for Large-scale L 2-loss Linear SVM , 2008 .