A Deep Spatio-Temporal Model for EEG-Based Imagined Speech Recognition
暂无分享,去创建一个
Automatic speech recognition interfaces are becoming increasingly pervasive in daily life as a means of interacting with and controlling electronic devices. Current speech interfaces, however, are infeasible for a variety of users and use cases, such as patients who suffer from locked-in syndrome or those who need privacy. In these cases, an interface that works based on envisioned speech, the idea of imagining what one wants to say, could be of benefit. Consequently, in this work, we propose an imagined speech Brain-Computer-Interface (BCI) using Electroencephalogram (EEG) signals. EEG signals are processed using a deep spatio-temporal learning architecture with 1D Convolutional Neural Networks (CNNs) and Long Short-Term Memory (LSTM), respectively. LSTM units are implemented in a many-to-many fashion to produce a time series of imagined speech outputs. Using this series, the performance of the system is boosted using majority vote (MV) post-processing to further improve results. The performance is evaluated on two publicly available datasets; one to test the performance of the tuned model, and another to test its generalization to a new dataset. The proposed architecture outperforms previous results with improvements of up to 23.7%.