Joint feature re-extraction and classification using an iterative semi-supervised support vector machine algorithm

Abstract The focus of this paper is on joint feature re-extraction and classification in cases when the training data set is small. An iterative semi-supervised support vector machine (SVM) algorithm is proposed, where each iteration consists both feature re-extraction and classification, and the feature re-extraction is based on the classification results from the previous iteration. Feature extraction is first discussed in the framework of Rayleigh coefficient maximization. The effectiveness of common spatial pattern (CSP) feature, which is commonly used in Electroencephalogram (EEG) data analysis and EEG-based brain computer interfaces (BCIs), can be explained by Rayleigh coefficient maximization. Two other features are also defined using the Rayleigh coefficient. These features are effective for discriminating two classes with different means or different variances. If we extract features based on Rayleigh coefficient maximization, a large training data set with labels is required in general; otherwise, the extracted features are not reliable. Thus we present an iterative semi-supervised SVM algorithm embedded with feature re-extraction. This iterative algorithm can be used to extract these three features reliably and perform classification simultaneously in cases where the training data set is small. Each iteration is composed of two main steps: (i) the training data set is updated/augmented using unlabeled test data with their predicted labels; features are re-extracted based on the augmented training data set. (ii) The re-extracted features are classified by a standard SVM. Regarding parameter setting and model selection of our algorithm, we also propose a semi-supervised learning-based method using the Rayleigh coefficient, in which both training data and test data are used. This method is suitable when cross-validation model selection may not work for small training data set. Finally, the results of data analysis are presented to demonstrate the validity of our approach.

[1]  Yoshua Bengio,et al.  Semi-supervised Learning by Entropy Minimization , 2004, CAP.

[2]  Stan Matwin,et al.  Email classification with co-training , 2011, CASCON.

[3]  Daphne Koller,et al.  Support Vector Machine Active Learning with Applications to Text Classification , 2000, J. Mach. Learn. Res..

[4]  Rayid Ghani,et al.  Analyzing the effectiveness and applicability of co-training , 2000, CIKM '00.

[5]  G. Pfurtscheller,et al.  Optimal spatial filtering of single trial EEG during imagined hand movement. , 2000, IEEE transactions on rehabilitation engineering : a publication of the IEEE Engineering in Medicine and Biology Society.

[6]  Lawrence Carin,et al.  A Bayesian approach to joint feature selection and classifier design , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Yuanqing Li,et al.  An Extended EM Algorithm for Joint Feature Extraction and Classification in Brain-Computer Interfaces , 2006, Neural Computation.

[8]  Tobias Scheffer,et al.  Using Transduction and Multi-view Learning to Answer Emails , 2003, PKDD.

[9]  Bernhard Schölkopf,et al.  Introduction to Semi-Supervised Learning , 2006, Semi-Supervised Learning.

[10]  Bernhard Schölkopf,et al.  Learning with Local and Global Consistency , 2003, NIPS.

[11]  Alexander Zien,et al.  Semi-Supervised Classification by Low Density Separation , 2005, AISTATS.

[12]  Thomas Hofmann,et al.  Semi-supervised Learning on Directed Graphs , 2004, NIPS.

[13]  O. Mangasarian,et al.  Semi-Supervised Support Vector Machines for Unlabeled Data Classification , 2001 .

[14]  Bernhard Schölkopf,et al.  Feature selection and transduction for prediction of molecular bioactivity for drug design , 2003, Bioinform..

[15]  G. Pfurtscheller,et al.  Brain-Computer Interfaces for Communication and Control. , 2011, Communications of the ACM.

[16]  Sebastian Mika,et al.  Kernel Fisher Discriminants , 2003 .

[17]  Byoung-Tak Zhang,et al.  Co-trained support vector machines for large scale unstructured document classification using unlabeled data and syntactic information , 2004, Inf. Process. Manag..

[18]  J. Lafferty,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[19]  B. Scholkopf,et al.  Fisher discriminant analysis with kernels , 1999, Neural Networks for Signal Processing IX: Proceedings of the 1999 IEEE Signal Processing Society Workshop (Cat. No.98TH8468).

[20]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[21]  G. Pfurtscheller,et al.  EEG-based discrimination between imagination of right and left hand movement. , 1997, Electroencephalography and clinical neurophysiology.

[22]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[23]  Ulf Brefeld,et al.  Co-EM support vector learning , 2004, ICML.

[24]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[25]  Gilles Blanchard,et al.  BCI competition 2003-data set IIa: spatial patterns of self-controlled brain rhythm modulations , 2004, IEEE Transactions on Biomedical Engineering.

[26]  K. Bennett,et al.  Optimization Approaches to Semi-Supervised Learning , 2001 .

[27]  Hui Kong,et al.  Generalized 2D Fisher Discriminant Analysis , 2005, BMVC.

[28]  Thorsten Joachims,et al.  Transductive Inference for Text Classification using Support Vector Machines , 1999, ICML.

[29]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[30]  Ayhan Demiriz,et al.  Semi-Supervised Support Vector Machines , 1998, NIPS.