A multi features fusion support vector machine for classification of emotion issue in the design of an audio recognition system
暂无分享,去创建一个
Most state-of-the-art automatic speech emotion recognition rely on utterance level statistics of features. In this study, spoken utterances are represented by a set of statistics from different features computed over all frames. Therefore, for exploiting the complementary emotion-specific information provided by individual features (spectral, prosodic and voice quality features), intelligent combination of features is expected. In this work, we use contour-based low-level descriptors to extract features from the emotional data and then fuse the evidences provided by these features. Finally, multi-class SVM modelling is performed directly at the output of the extracted features. The experiments were carried out on the Berlin corpus consisting of six basic emotions: sadness, boredom, neutral, fear, happiness, anger and the neutral state (no emotion). The results demonstrate that on the average, the features obtained from different information streams and combined at the decision level outperforms the single features or the features combined at the feature level in terms of classification accuracy.