The effect of methods addressing the class imbalance problem on P300 detection

This paper studies empirically the effect of different sampling methods on training classifiers on the imbalanced data of the BCI P300 Speller. Both over-sampling and under-sampling are considered. Besides some existing methods like SMOTE that have been shown to be effective in addressing the class imbalance problem we also proposed a new under-sampling technology, namely, instance-remove algorithm which is based on the property of P300 data sets. The classifiers for testing are FLDA and linear SVM. Experimental results suggest that not all of the sampling methods are effective in P300 detection, and even the same method may have different influence on different classifiers. It reveals that the SMOTE technique which is a variant of over-sampling is very effective in training an FLDA classifier while other methods are slightly effective or ineffective both in training FLDA and Linear SVM. The study also suggests that the over-sampling is more effective than under-sampling on both classifiers.

[1]  E. Donchin,et al.  Talking off the top of your head: toward a mental prosthesis utilizing event-related brain potentials. , 1988, Electroencephalography and clinical neurophysiology.

[2]  Gert Pfurtscheller,et al.  Neural network based classification of single-trial EEG data , 1993, Artif. Intell. Medicine.

[3]  G Pfurtscheller,et al.  Using time-dependent neural networks for EEG classification. , 2000, IEEE transactions on rehabilitation engineering : a publication of the IEEE Engineering in Medicine and Biology Society.

[4]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[5]  Nathalie Japkowicz,et al.  The class imbalance problem: A systematic study , 2002, Intell. Data Anal..

[6]  Robert C. Holte,et al.  C4.5, Class Imbalance, and Cost Sensitivity: Why Under-Sampling beats Over-Sampling , 2003 .

[7]  Helge J. Ritter,et al.  BCI competition 2003-data set IIb: support vector machines for the P300 speller paradigm , 2004, IEEE Transactions on Biomedical Engineering.

[8]  Fusheng Yang,et al.  BCI competition 2003-data set IIb: enhancing P300 wave detection using ICA-based subspace projections for BCI applications , 2004, IEEE Transactions on Biomedical Engineering.

[9]  Gustavo E. A. P. A. Batista,et al.  A study of the behavior of several methods for balancing machine learning training data , 2004, SKDD.

[10]  N. Birbaumer,et al.  BCI2000: a general-purpose brain-computer interface (BCI) system , 2004, IEEE Transactions on Biomedical Engineering.

[11]  Vladimir Bostanov,et al.  BCI competition 2003-data sets Ib and IIb: feature extraction from event-related brain potentials with the continuous wavelet transform and the t-value scalogram , 2004, IEEE Transactions on Biomedical Engineering.

[12]  U. Hoffmann,et al.  A Boosting Approach to P300 Detection with Application to Brain-Computer Interfaces , 2005, Conference Proceedings. 2nd International IEEE EMBS Conference on Neural Engineering, 2005..

[13]  M Congedo,et al.  A review of classification algorithms for EEG-based brain–computer interfaces , 2007, Journal of neural engineering.

[14]  Alain Rakotomamonjy,et al.  BCI Competition III: Dataset II- Ensemble of SVMs for BCI P300 Speller , 2008, IEEE Transactions on Biomedical Engineering.

[15]  Li Zhu,et al.  Data Mining on Imbalanced Data Sets , 2008, 2008 International Conference on Advanced Computer Theory and Engineering.

[16]  Hubert Cecotti,et al.  Convolutional Neural Networks for P300 Detection with Application to Brain-Computer Interfaces , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Sung-Phil Kim,et al.  Detection of P300 components using the Wiener filter for BCI-based spellers , 2011, 2011 8th Asian Control Conference (ASCC).