Gender and Emotion Classification By Hierarchical Modelling Using Convolutional Neural Network

Sound is all about vibration. To make a sound something has to vibrate in human this task is performed by larynx. We humans talk to communicate and convey our feelings to each other. Hence there is increased interest in the field of computer science for acoustics. Various applications like automatic speech recognition, age, gender, prosody, emotion and sentiment recognition from speech signals are paving path for better human machine interaction. In this research paper an attempt has been made to predict gender and emotion from speech signal and a detailed comparison of our four models developed has been presented which highlights the relationship between gender and emotion classification accuracies. Our results have shown that creating separate emotion recognition model for male and female voices generates higher accuracy as compared to single model for both classifiers.