BENDR: Using Transformers and a Contrastive Self-Supervised Learning Task to Learn From Massive Amounts of EEG Data

Deep neural networks (DNNs) used for brain–computer interface (BCI) classification are commonly expected to learn general features when trained across a variety of contexts, such that these features could be fine-tuned to specific contexts. While some success is found in such an approach, we suggest that this interpretation is limited and an alternative would better leverage the newly (publicly) available massive electroencephalography (EEG) datasets. We consider how to adapt techniques and architectures used for language modeling (LM) that appear capable of ingesting awesome amounts of data toward the development of encephalography modeling with DNNs in the same vein. We specifically adapt an approach effectively used for automatic speech recognition, which similarly (to LMs) uses a self-supervised training objective to learn compressed representations of raw data signals. After adaptation to EEG, we find that a single pre-trained model is capable of modeling completely novel raw EEG sequences recorded with differing hardware, and different subjects performing different tasks. Furthermore, both the internal representations of this model and the entire architecture can be fine-tuned to a variety of downstream BCI and EEG classification tasks, outperforming prior work in more task-specific (sleep stage classification) self-supervision.

[1]  Joseph Picone,et al.  The Temple University Hospital EEG Data Corpus , 2016, Front. Neurosci..

[2]  Aapo Hyvärinen,et al.  Self-Supervised Representation Learning from Electroencephalography Signals , 2019, 2019 IEEE 29th International Workshop on Machine Learning for Signal Processing (MLSP).

[3]  Terrence J Sejnowski,et al.  The unreasonable effectiveness of deep learning in artificial intelligence , 2020, Proceedings of the National Academy of Sciences.

[4]  Frank Rudzicz,et al.  DN3: An open-source Python library for large-scale raw neurophysiology data assimilation for more flexible and standardized deep learning , 2020 .

[5]  Jon Kleinberg,et al.  Transfusion: Understanding Transfer Learning for Medical Imaging , 2019, NeurIPS.

[6]  Oriol Vinyals,et al.  Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.

[7]  Kaiming He,et al.  Rethinking ImageNet Pre-Training , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[8]  Hod Lipson,et al.  Understanding Neural Networks Through Deep Visualization , 2015, ArXiv.

[9]  Alexei A. Efros,et al.  What makes ImageNet good for transfer learning? , 2016, ArXiv.

[10]  T. Jung,et al.  Improving EEG-Based Emotion Classification Using Conditional Transfer Learning , 2017, Front. Hum. Neurosci..

[11]  Alex Graves,et al.  Supervised Sequence Labelling with Recurrent Neural Networks , 2012, Studies in Computational Intelligence.

[12]  Wei Xu,et al.  ABC-CNN: An Attention Based Convolutional Neural Network for Visual Question Answering , 2015, ArXiv.

[13]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[14]  Tiago H. Falk,et al.  Deep learning-based electroencephalography analysis: a systematic review , 2019, Journal of neural engineering.

[15]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[16]  Richard Kohar,et al.  A New Timing Error Cost Function for Binary Time Series Prediction , 2020, IEEE Transactions on Neural Networks and Learning Systems.

[17]  Aeilko H. Zwinderman,et al.  Analysis of a sleep-dependent neuronal feedback loop: the slow-wave microcontinuity of the EEG , 2000, IEEE Transactions on Biomedical Engineering.

[18]  Cuntai Guan,et al.  Inter-subject transfer learning with an end-to-end deep convolutional neural network for EEG-based BCI , 2019, Journal of neural engineering.

[19]  Geoffrey E. Hinton,et al.  Big Self-Supervised Models are Strong Semi-Supervised Learners , 2020, NeurIPS.

[20]  Ekapol Chuangsuwanich,et al.  Universal Joint Feature Extraction for P300 EEG Classification Using Multi-Task Autoencoder , 2018, IEEE Access.

[21]  Quoc V. Le,et al.  Do Better ImageNet Models Transfer Better? , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Lina Yao,et al.  Motor Imagery Classification via Temporal Attention Cues of Graph Embedded EEG Signals , 2020, IEEE Journal of Biomedical and Health Informatics.

[24]  Ali Razavi,et al.  Data-Efficient Image Recognition with Contrastive Predictive Coding , 2019, ICML.

[25]  Erhan Ekmekcioglu,et al.  Investigating the Use of Pretrained Convolutional Neural Network on Cross-Subject and Cross-Dataset EEG Emotion Recognition , 2020, Sensors.

[26]  Frank Rudzicz,et al.  Thinker invariance: enabling deep neural networks for BCI across more people , 2020, Journal of neural engineering.

[27]  Stanislas Chambon,et al.  A Deep Learning Architecture for Temporal Sleep Stage Classification Using Multivariate and Multimodal Time Series , 2017, IEEE Transactions on Neural Systems and Rehabilitation Engineering.

[28]  Brent Lance,et al.  EEGNet: a compact convolutional neural network for EEG-based brain–computer interfaces , 2016, Journal of neural engineering.

[29]  Benjamin Blankertz,et al.  Towards a Cure for BCI Illiteracy , 2009, Brain Topography.

[30]  Xianzhi Wang,et al.  A survey on deep learning-based non-invasive brain signals: recent advances and new frontiers , 2019, Journal of neural engineering.

[31]  Yoshua Bengio,et al.  Interpretable Convolutional Filters with SincNet , 2018, ArXiv.

[32]  Per B. Sederberg,et al.  Meeting brain–computer interface user performance expectations using a deep neural network decoding framework , 2018, Nature Medicine.

[33]  Luca Citi,et al.  Documenting, modelling and exploiting P300 amplitude changes due to variable target delays in Donchin's speller , 2010, Journal of neural engineering.

[34]  Mikhail Khodak,et al.  A Theoretical Analysis of Contrastive Unsupervised Representation Learning , 2019, ICML.

[35]  Bertrand Olivier,et al.  Objective and subjective evaluation of online error correction during P300-based spelling , 2012 .

[36]  Wolfram Burgard,et al.  Deep learning with convolutional neural networks for EEG decoding and visualization , 2017, Human brain mapping.

[37]  Kaiming He,et al.  Group Normalization , 2018, International Journal of Computer Vision.

[38]  Min Liu,et al.  A Deep Transfer Convolutional Neural Network Framework for EEG Signal Classification , 2019, IEEE Access.

[39]  Maksims Volkovs,et al.  Improving Transformer Optimization Through Better Initialization , 2020, ICML.

[40]  Luke S. Zettlemoyer,et al.  Transformers with convolutional context for ASR , 2019, ArXiv.

[41]  N. Birbaumer,et al.  BCI2000: a general-purpose brain-computer interface (BCI) system , 2004, IEEE Transactions on Biomedical Engineering.

[42]  Frank Rudzicz,et al.  On Losses for Modern Language Models , 2020, EMNLP.

[43]  Kevin Gimpel,et al.  Bridging Nonlinearities and Stochastic Regularizers with Gaussian Error Linear Units , 2016, ArXiv.

[44]  Jeffrey M. Hausdorff,et al.  Physionet: Components of a New Research Resource for Complex Physiologic Signals". Circu-lation Vol , 2000 .

[45]  M Congedo,et al.  A review of classification algorithms for EEG-based brain–computer interfaces: a 10 year update , 2018, Journal of neural engineering.

[46]  Valer Jurcak,et al.  10/20, 10/10, and 10/5 systems revisited: Their validity as relative head-surface-based positioning systems , 2007, NeuroImage.

[47]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[48]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[49]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[50]  Hanwang Zhang,et al.  Long-Tailed Classification by Keeping the Good and Removing the Bad Momentum Causal Effect , 2020, NeurIPS.

[51]  Kevin Gimpel,et al.  Gaussian Error Linear Units (GELUs) , 2016 .

[52]  Omer Levy,et al.  SpanBERT: Improving Pre-training by Representing and Predicting Spans , 2019, TACL.

[53]  Abdel-rahman Mohamed,et al.  Effectiveness of Self-Supervised Pre-Training for ASR , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[54]  James R. Glass,et al.  Vector-Quantized Autoregressive Predictive Coding , 2020, INTERSPEECH.

[55]  Sadasivan Puthusserypady,et al.  An end-to-end deep learning approach to MI-EEG signal classification for BCIs , 2018, Expert Syst. Appl..

[56]  Klaus-Robert Müller,et al.  A large scale screening study with a SMR-based BCI: Categorization of BCI users and differences in their SMR activity , 2019, PloS one.

[57]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[58]  Aapo Hyvärinen,et al.  Uncovering the structure of clinical EEG signals with self-supervised learning , 2020, Journal of neural engineering.

[59]  Michal Valko,et al.  Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning , 2020, NeurIPS.

[60]  Eric R. Ziegel,et al.  Understanding Neural Networks , 1980 .

[61]  Xiangang Li,et al.  A Further Study of Unsupervised Pretraining for Transformer Based Speech Recognition , 2020, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[62]  Colin Raffel,et al.  Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[63]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[64]  Ad Aertsen,et al.  Review of the BCI Competition IV , 2012, Front. Neurosci..

[65]  Quoc V. Le,et al.  Domain Adaptive Transfer Learning with Specialist Models , 2018, ArXiv.

[66]  Behnam Neyshabur,et al.  What is being transferred in transfer learning? , 2020, NeurIPS.

[67]  Tonio Ball,et al.  Machine-learning-based diagnostics of EEG pathology , 2020, NeuroImage.

[68]  Edouard Grave,et al.  Reducing Transformer Depth on Demand with Structured Dropout , 2019, ICLR.

[69]  Frank Rudzicz,et al.  Machine learning for MEG during speech tasks , 2019, Scientific Reports.

[70]  Minkyu Ahn,et al.  Journal of Neuroscience Methods , 2015 .

[71]  Christian Jutten,et al.  Transfer Learning: A Riemannian Geometry Framework With Applications to Brain–Computer Interfaces , 2018, IEEE Transactions on Biomedical Engineering.

[72]  Colin Wei,et al.  Learning Imbalanced Datasets with Label-Distribution-Aware Margin Loss , 2019, NeurIPS.

[73]  Alexei Baevski,et al.  wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations , 2020, NeurIPS.

[74]  Georg Heigold,et al.  An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2021, ICLR.

[75]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[76]  U. Rajendra Acharya,et al.  SleepEEGNet: Automated sleep stage scoring with sequence to sequence deep learning approach , 2019, PloS one.