Multimodal Emotion Recognition

Multimodal fusion is the process whereby two or more forms of input are gathered together in order to produce a higher overall classification accuracy than individual unimodal systems. This is a popular technique in emotion recognition. In this study, we attempted to discover how much we could improve upon individual unimodal systems using decision level fusion. To accomplish this, we acquired two emotion classification systems, one that worked on audio input alone and another that worked on visual input, and combined their output using a set of manual rules and a classifier to achieve higher classification accuracy.