Emotion recognition using ensemble of cepstral, perceptual and temporal features

Despite the great progress made in Artificial Intelligence, the machine couldn't identify the emotional state of the speaker as it is hard for even to human. The research works have been carried out to improve recognition of emotional state of the speaker to achieve better Human Computer Interaction(HCI). Speech samples are characterized by many features and our aim is to define set of features that will significantly contribute to classification of emotions. In our approach we are set to extract few of these features to analyze and use them to classify the emotions. This approach is expected to achieve classification with varying accuracy while trying to classify using each of these features individually. Feature ensemble is adopted to select the features providing greater accuracy and group them together. Experiments a carried out using EMO-DB corpus to analyze the effect of ensemble in SER and found that among the small pool of features, the feature selection and ensemble of cepstral, perceptual and temporal feature provides 12.47% improvement in accuracy compared to better feature among the pool.