Speech emotion recognition based on data enhancement in time-frequency domain

Currently, there is a lack of voice samples in the speech emotion recognition field, which leads to poor recognition rate and over-fitting of data. Inspire by this, we propose speech emotion recognition based on data enhancement. The Berlin Emotional Corpus is enhanced from two directions: Time Domain and Frequency Domain. The samples was extracted and trained. Research and analyze the recognition rate of two classifiers: K-Nearest Neighbor and Support Vector Machine. Experiments show that the effect after data enhancement is better.

[1]  Masato Akagi,et al.  Feature selection method for real-time speech emotion recognition , 2017, 2017 20th Conference of the Oriental Chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment (O-COCOSDA).

[2]  M. Aizerman,et al.  Theoretical Foundations of the Potential Function Method in Pattern Recognition Learning , 1964 .

[3]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[4]  Fuji Ren,et al.  Semi-Automatic Creation of Youth Slang Corpus and Its Application to Affective Computing , 2016, IEEE Transactions on Affective Computing.

[5]  Yusuke Hioka,et al.  Automatic Parameter Switching of Noise Reduction for Speech Recognition , 2017 .

[6]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..