Improving EEG feature learning via synchronized facial video

Morden physiological analysis begins to involve more and more types of information. Electroencephalogram (EEG) signals as a typical example is starting to be analyzed with facial expressions videos to detect emotions. Emotions play an important role in the daily life of human beings, the need and importance of automatic emotion recognition has grown with increasing role of human computer interface applications. In this paper, we concentrate on recognition of the emotions jointly from "inner" and "outer" reactions, which are electroencephalogram (EEG) signals and facial expression video. Due to the streaming nature of this problem, the data volume and velocity is very challenging. We address these challenges from the theoretic perspective and propose a real time algorithm based on EEG signals and synchronized facial video to learn feature vector jointly. Our algorithm consists of an unsupervisedly EEG dictionary component based on deep learning theorem, and a probability pooling component transforms a continuous sequential signal into an EEG "sentence" which consists of a sequence of EEG words. The EEG sentence is then jointly learned with video features into a new fixed length feature representation for emotion classification. We overcome several computational challenges on the data based on the idea of convolution and pooling, and we conduct extensive evaluation for each component of our model. We also demonstrate the state-of-the-art classification result on real-world dataset. The superior performances on the emotion recognition task indicates that 1) the natural language scenario can be applied in EEG sequences and 2) borrowing video modality can increase the overall performance.