A comparison of neural networks for real-time emotionrecognition from speech signals A comparison of neural networks for real-time emotionrecognition from speech signals

Speech and emotion recognition improve the quality of human computer interaction and allow easier to use interfaces for every level of user in software applications. In this study, we have developed two different neural networks called emotion recognition neural network (ERNN) and Gram-Charlier emotion recognition neural network (GERNN) to classify the voice signals for emotion recognition. The ERNN has 128 input nodes, 20 hidden neurons, and three summing output nodes. A set of 97920 training sets is used to train the ERNN. A new set of 24480 testing sets is utilized to test the ERNN performance. The samples tested for voice recognition are acquired from the movies "Anger Management" and "Pick of Destiny". ERNN achieves an average recognition performance of 100%. This high level of recognition suggests that the ERNN is a promising method for emotion recognition in computer applications. Furthermore, the GERNN has four input nodes, 20 hidden neurons, and three output nodes. The GERNN achieves an average recognition performance of 33%. This shows us that we cannot use Gram-Charlier coefficients to discriminate emotion signals. In addition, Hinton diagrams were utilized to display the optimality of ERNN weights.

[1]  David M. Skapura,et al.  Neural networks - algorithms, applications, and programming techniques , 1991, Computation and neural systems series.

[2]  Timothy Masters,et al.  Practical neural network recipes in C , 1993 .

[3]  A. W. Kemp,et al.  Kendall's Advanced Theory of Statistics. , 1994 .

[4]  Chun Chen,et al.  Emotion Recognition from Noisy Speech , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[5]  Yi Luo,et al.  Speech emotion recognition based on a hybrid of HMM/ANN , 2007 .

[6]  Fu-Ming Lee,et al.  Recognizing low/high anger in speech for call centers , 2008 .

[7]  J. J. Paulos,et al.  Artificial neural networks using MOS analog multipliers , 1990 .

[8]  John J. Paulos,et al.  A neural network learning algorithm tailored for VLSI implementation , 1994, IEEE Trans. Neural Networks.

[9]  Frank Dellaert,et al.  Recognizing emotion in speech , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[10]  Ryoichi Komiya,et al.  Comparison between fuzzy and NN method for speech emotion recognition , 2005, Third International Conference on Information Technology and Applications (ICITA'05).