A Transductive Support Vector Machine with adjustable quasi-linear kernel for semi-supervised data classification

This paper focuses on semi-supervised classification problem by using Transductive Support Vector Machine. Traditional TSVM for semi-supervised classification firstly train an SVM model with labeled data. Then use the model to predict unlabeled data and optimize unlabeled data prediction to retrain the SVM. TSVM always uses a predefined kernel and fixed parameters during the optimization procedure and they also suffers potential over-fitting problem. In this paper we introduce proposed quasi-linear kernel to the TSVM. An SVM with quasi-linear kernel realizes an approximate nonlinear separation boundary by multi-local linear boundaries with interpolation. By applying quasi-linear kernel to semi-supervised classification it can avoid potential over-fitting and provide more accurate unlabeled data prediction. After unlabeled data prediction optimization, the quasi-linear kernel can be further adjusted considering the potential boundary data distribution as prior knowledge. We also introduce a minimal set method for optimizing unlabeled data prediction. The minimal set method follows the clustering assumption of semi-supervised learning. The pairwise label switching is allowed between minimal sets. It can speed up optimization procedure and reduce influence from label constrain in TSVM. Experiment results on benchmark gene datasets show that the proposed method is effective and improves classification performances.

[1]  J. Gower,et al.  Minimum Spanning Trees and Single Linkage Cluster Analysis , 1969 .

[2]  Xiaojin Zhu,et al.  --1 CONTENTS , 2006 .

[3]  Yu Cheng,et al.  Identification of Quasi-ARX Neurofuzzy Model with an SVR and GA Approach , 2012, IEICE Trans. Fundam. Electron. Commun. Comput. Sci..

[4]  Robert E. Schapire,et al.  Hierarchical multi-label prediction of gene function , 2006, Bioinform..

[5]  Ayhan Demiriz,et al.  Semi-Supervised Support Vector Machines , 1998, NIPS.

[6]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[7]  Jason Weston,et al.  Trading convexity for scalability , 2006, ICML.

[8]  Thorsten Joachims,et al.  Transductive Inference for Text Classification using Support Vector Machines , 1999, ICML.

[9]  Alexander Zien,et al.  A continuation method for semi-supervised SVMs , 2006, ICML.

[10]  Zeshui Xu,et al.  Intuitionistic fuzzy MST clustering algorithms , 2012, Comput. Ind. Eng..

[11]  Alexander Zien,et al.  Semi-Supervised Classification by Low Density Separation , 2005, AISTATS.

[12]  Yan Zhou,et al.  Minimum Spanning Tree Based Clustering Algorithms , 2006, 2006 18th IEEE International Conference on Tools with Artificial Intelligence (ICTAI'06).

[13]  K. Hirasawa,et al.  A Quasi-ARMAX approach to modelling of non-linear systems , 2001 .

[14]  Jinglu Hu,et al.  Local linear multi-SVM method for gene function classification , 2010, 2010 Second World Congress on Nature and Biologically Inspired Computing (NaBIC).

[15]  Jinglu Hu,et al.  A quasi-linear SVM combined with assembled SMOTE for imbalanced data classification , 2013, The 2013 International Joint Conference on Neural Networks (IJCNN).

[16]  J. Kruskal On the shortest spanning subtree of a graph and the traveling salesman problem , 1956 .

[17]  Jinglu Hu,et al.  Fast SVM training using edge detection on very large datasets , 2013 .