Classifier transfer with data selection strategies for online support vector machine classification with class imbalance

OBJECTIVE Classifier transfers usually come with dataset shifts. To overcome dataset shifts in practical applications, we consider the limitations in computational resources in this paper for the adaptation of batch learning algorithms, like the support vector machine (SVM). APPROACH We focus on data selection strategies which limit the size of the stored training data by different inclusion, exclusion, and further dataset manipulation criteria like handling class imbalance with two new approaches. We provide a comparison of the strategies with linear SVMs on several synthetic datasets with different data shifts as well as on different transfer settings with electroencephalographic (EEG) data. MAIN RESULTS For the synthetic data, adding only misclassified samples performed astoundingly well. Here, balancing criteria were very important when the other criteria were not well chosen. For the transfer setups, the results show that the best strategy depends on the intensity of the drift during the transfer. Adding all and removing the oldest samples results in the best performance, whereas for smaller drifts, it can be sufficient to only add samples near the decision boundary of the SVM which reduces processing resources. SIGNIFICANCE For brain-computer interfaces based on EEG data, models trained on data from a calibration session, a previous recording session, or even from a recording session with another subject are used. We show, that by using the right combination of data selection criteria, it is possible to adapt the SVM classifier to overcome the performance drop from the transfer.

[1]  Ignacio Santamaría,et al.  A Sliding-Window Kernel RLS Algorithm and Its Application to Nonlinear Channel Identification , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[2]  Wolfgang Rosenstiel,et al.  Adaptive SVM-Based Classification Increases Performance of a MEG-Based Brain-Computer Interface (BCI) , 2012, ICANN.

[3]  Gunnar Rätsch,et al.  An introduction to kernel-based learning algorithms , 2001, IEEE Trans. Neural Networks.

[4]  Don R. Hush,et al.  Training SVMs Without Offset , 2011, J. Mach. Learn. Res..

[5]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[6]  Elsa Andrea Kirchner,et al.  EMG Onset Detection - Comparison of Different Methods for a Movement Prediction Task based on EMG , 2013, BIOSIGNALS.

[7]  Mario Michael Krell,et al.  Memory and Processing Efficient Formula for Moving Variance Calculation in EEG and EMG Signal Processing , 2013, NEUROTECHNIX.

[8]  Tong Zhang,et al.  Solving large scale linear prediction problems using stochastic gradient descent algorithms , 2004, ICML.

[9]  Klaus-Robert Müller,et al.  Adaptive Methods in BCI Research - An Introductory Tutorial , 2009 .

[10]  Mario Michael Krell,et al.  raxDAWN: Circumventing Overfitting of the Adaptive xDAWN , 2015, NEUROTECHNIX.

[11]  Christian Kothe,et al.  Towards passive brain–computer interfaces: applying brain–computer interface technology to human–machine systems in general , 2011, Journal of neural engineering.

[12]  Dean J Krusienski,et al.  A comparison of classification techniques for the P300 Speller , 2006, Journal of neural engineering.

[13]  Guillaume Gibert,et al.  xDAWN Algorithm to Enhance Evoked Potentials: Application to Brain–Computer Interface , 2009, IEEE Transactions on Biomedical Engineering.

[14]  Sirko Straube,et al.  Online movement prediction in a robotic application scenario , 2013, 2013 6th International IEEE/EMBS Conference on Neural Engineering (NER).

[15]  David R. Musicant,et al.  Successive overrelaxation for support vector machines , 1999, IEEE Trans. Neural Networks.

[16]  Michele Folgheraiter,et al.  Measuring the Improvement of the Interaction Comfort of a Wearable Exoskeleton , 2012, Int. J. Soc. Robotics.

[17]  Yoav Freund,et al.  Large Margin Classification Using the Perceptron Algorithm , 1998, COLT.

[18]  Derong Liu,et al.  Detecting and Reacting to Changes in Sensing Units: The Active Classifier Case , 2014, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[19]  Jason Weston,et al.  Fast Kernel Classifiers with Online and Active Learning , 2005, J. Mach. Learn. Res..

[20]  Frank Kirchner,et al.  An Adaptive Spatial Filter for User-Independent Single Trial Detection of Event-Related Potentials , 2015, IEEE Transactions on Biomedical Engineering.

[21]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[22]  Nitesh V. Chawla,et al.  Noname manuscript No. (will be inserted by the editor) Learning from Streaming Data with Concept Drift and Imbalance: An Overview , 2022 .

[23]  Motoaki Kawanabe,et al.  Toward Unsupervised Adaptation of LDA for Brain–Computer Interfaces , 2011, IEEE Transactions on Biomedical Engineering.

[24]  Yoram Singer,et al.  The Forgetron: A Kernel-Based Perceptron on a Budget , 2008, SIAM J. Comput..

[25]  F Babiloni,et al.  A comparison of classification techniques for a gaze-independent P300-based brain-computer interface. , 2012, Journal of neural engineering.

[26]  F Cincotti,et al.  Influence of P300 latency jitter on event related potential-based brain–computer interface performance , 2014, Journal of neural engineering.

[27]  M. Hallett,et al.  What is the Bereitschaftspotential? , 2006, Clinical Neurophysiology.

[28]  Elsa Andrea Kirchner,et al.  Classifier Transferability in the Detection of Error Related Potentials from Observation to Interaction , 2013, 2013 IEEE International Conference on Systems, Man, and Cybernetics.

[29]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[30]  Rong Chen,et al.  Online weighted LS-SVM for hysteretic structural system identification , 2006 .

[31]  Mario Michael Krell,et al.  Balanced Relative Margin Machine - The missing piece between FDA and SVM classification , 2014, Pattern Recognit. Lett..

[32]  Mario Michael Krell,et al.  Backtransformation: a new representation of data processing chains with a scalar decision function , 2017, Adv. Data Anal. Classif..

[33]  M. Fahle,et al.  On the Applicability of Brain Reading for Predictive Human-Machine Interfaces in Robotics , 2013, PloS one.

[34]  Eugene Santos,et al.  Infusing Social Networks With Culture , 2014, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[35]  Klaus-Robert Müller,et al.  Incremental Support Vector Learning: Analysis, Implementation and Applications , 2006, J. Mach. Learn. Res..

[36]  Motoaki Kawanabe,et al.  Invariant Common Spatial Patterns: Alleviating Nonstationarities in Brain-Computer Interfacing , 2007, NIPS.

[37]  Mario Michael Krell,et al.  Generalizing, decoding, and optimizing support vector machine classification , 2018, ArXiv.

[38]  Stefan Haufe,et al.  Single-trial analysis and classification of ERP components — A tutorial , 2011, NeuroImage.

[39]  Bernhard Schölkopf,et al.  Estimating the Support of a High-Dimensional Distribution , 2001, Neural Computation.

[40]  Dennis J. McFarland,et al.  Should the parameters of a BCI translation algorithm be continually adapted? , 2011, Journal of Neuroscience Methods.

[41]  I Iturrate,et al.  Task-dependent signal variations in EEG error-related potentials for brain–computer interfaces , 2013, Journal of neural engineering.

[42]  S Pozzi,et al.  A passive brain-computer interface application for the mental workload assessment on professional air traffic controllers during realistic air traffic control tasks. , 2016, Progress in brain research.

[43]  Jan Peters,et al.  Incremental online sparsification for model learning in real-time robot control , 2011, Neurocomputing.

[44]  S A Hillyard,et al.  P3 waves to the discrimination of targets in homogeneous and heterogeneous stimulus sequences. , 1977, Psychophysiology.

[45]  Wei Xu,et al.  Incremental SVM based on reserved set for network intrusion detection , 2011, Expert Syst. Appl..

[46]  Hsuan-Tien Lin,et al.  A note on Platt’s probabilistic outputs for support vector machines , 2007, Machine Learning.

[47]  Weifeng Liu,et al.  Fixed-budget kernel recursive least-squares , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[48]  William Nick Street,et al.  A streaming ensemble algorithm (SEA) for large-scale classification , 2001, KDD '01.

[49]  Chih-Jen Lin,et al.  A dual coordinate descent method for large-scale linear SVM , 2008, ICML '08.

[50]  Ricardo Chavarriaga,et al.  Latency correction of error potentials between different experiments reduces calibration time for single-trial classification , 2012, 2012 Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[51]  Fabio Babiloni,et al.  Reliability over time of EEG-based mental workload evaluation during Air Traffic Management (ATM) tasks , 2015, 2015 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC).

[52]  Klaus-Robert Müller,et al.  Subject-independent mental state classification in single trials , 2009, Neural Networks.

[53]  Arthur Gretton,et al.  On-line one-class support vector machines. An application to signal segmentation , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[54]  Huosheng Hu,et al.  Adaptive schemes applied to online SVM for BCI data classification , 2009, 2009 Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[55]  Mario Michael Krell,et al.  pySPACE—a signal processing and classification environment in Python , 2013, Front. Neuroinform..

[56]  Rajesh P. N. Rao,et al.  Towards adaptive classification for BCI , 2006, Journal of neural engineering.

[57]  Klaus-Robert Müller,et al.  Towards Zero Training for Brain-Computer Interfacing , 2008, PloS one.

[58]  Youfu Li,et al.  Incremental support vector machine learning in the primal and applications , 2009, Neurocomputing.

[59]  Benjamin Blankertz,et al.  Towards a Cure for BCI Illiteracy , 2009, Brain Topography.

[60]  Koby Crammer,et al.  Online Passive-Aggressive Algorithms , 2003, J. Mach. Learn. Res..

[61]  Mario Michael Krell,et al.  How to evaluate an agent's behavior to infrequent events?—Reliable performance estimation insensitive to class distribution , 2014, Front. Comput. Neurosci..

[62]  Elsa Andrea Kirchner,et al.  Embedded brain reading , 2014 .

[63]  Cuntai Guan,et al.  Regularizing Common Spatial Patterns to Improve BCI Designs: Unified Theory and New Algorithms , 2011, IEEE Transactions on Biomedical Engineering.

[64]  Martin Spüler,et al.  Error-related potentials during continuous feedback: using EEG to detect errors of different type and severity , 2015, Front. Hum. Neurosci..

[65]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[66]  Elsa Andrea Kirchner,et al.  Minimizing Calibration Time for Brain Reading , 2011, DAGM-Symposium.

[67]  Yuanqing Li,et al.  A self-training semi-supervised SVM algorithm and its application in an EEG-based brain computer interface speller system , 2008, Pattern Recognit. Lett..

[68]  Elsa Andrea Kirchner,et al.  Handling Few Training Data: Classifier Transfer Between Different Types of Error-Related Potentials , 2016, IEEE Transactions on Neural Systems and Rehabilitation Engineering.

[69]  Mario Michael Krell,et al.  Comparison of Data Selection Strategies for Online Support Vector Machine Classification , 2015, NEUROTECHNIX.

[70]  Alexander J. Smola,et al.  Learning with Kernels: support vector machines, regularization, optimization, and beyond , 2001, Adaptive computation and machine learning series.

[71]  Kazushi Ikeda,et al.  A Support Vector Machine with Forgetting Factor and Its Statistical Properties , 2008, ICONIP.