Detection of affective states from speech signals using ensembles of classifiers

Recently, the focus of investigation has been the design of classifier systems that achieve optimal classification accuracy in detecting affective states from speech signals. Previous works have shown the inadequacy of single classifier models to deal with this problem. In this work we propose to use ensemble learning techniques such as random forest and kernel factory as classifiers in the design of a speech emotion recognition system. The system proceeds from speech signal pre-processing, feature extraction, construction of classifiers, and finally to the prediction of emotion. Features related to fundamental frequency, energy, mel-frequency cepstrum coefficients and linear predictive cepstrum coefficients were extracted from the individual segments. Subsequently, we trained two ensembles classifiers, namely, random forest and kernel factory. We have tested our approach on a number of speech databases. The results showed that ensemble classifiers yielded superior classification performance compared to single models by at most 20% increase. We also found out that our results exceeded the results of existing studies. We concluded that ensemble classifiers are effective for the identification of emotions, hence, suitable for this domain.