论文信息 - Emotional speech recognition based on modified parameter and distance of statistical model of pitch

Emotional speech recognition based on modified parameter and distance of statistical model of pitch

Based on resolution of pitch, a modified Parzen-window method, which can maintain high resolution in low frequencies and eliminate the jitter in high frequencies, is proposed to obtain a statistical model. Then, a gender classification utilizing the statistical model is proposed. Accuracy can achieve 98% while long sentence is to be classified. By analyzing the differences between genders, modified parameters about pitch are proposed, and the following parameters: (1) modified means of pitch, (2) modified standard deviations of pitch, and (3) Bhattacharyya Distance of statistical models of pitch, are utilized for pattern classification. Finally, an emotion recognition experiment based on K Nearest Neighbor is described. A 81% rate of recognition can be achieved when our parameters are utilized; whereas only 73.8% is obtained when normal parameters are utilized.

Zou Cai-rong