Speech neuromuscular decoding based on spectrogram images using conformal predictors with Bi-LSTM

Abstract The relationships between muscle movements and neural signals make it possible to decode silent speech based on neuromuscular activities. The decoding can be formulated as a supervised classification task. The electromyography (EMG) captured from surface articulatory muscles contains useful information that can help assist in decoding of speech. Spectrograms obtained from EMG have a wealth of information relating to the decoding, but have not yet been fully explored. In addition, the decoding results are often uncertain. Therefore, it is important to quantify the prediction confidence. This paper aims to improve the decoding performance by representing time series signals as spectrograms and utilising Inductive Conformal Prediction (ICP) to provide predictions with confidence. All EMG data are recorded on six dedicated facial muscles while participants recite the displayed words subvocally. Three pre-trained convolutional models of MobileNet-V1, ResNet18 and Xception are used to extract features from spectrograms for classification. Both bidirectional Long-Short Time Memory (Bi-LSTM) and Gate Recurrent Unit (GRU) classifiers are used for prediction. Furthermore, an ICP decoder based on Bi-LSTM is built to provide guaranteed predictions for each example at a specified confidence level. The proposed method of combining feature extraction based on Xception and classification using Bi-LSTM gives a higher accuracy of 0.87 than other methods. ICP outputs confidence measurements for each example that can help users to evaluate the reliability of new predictions. Experimental results demonstrate the practical usefulness in decoding articulatory neuromuscular activity and the advantages in applying ICP.