Extraction of expression from Japanese speech based on time-frequency and fractal features

The extraction method based on time-frequency and fractal features was proposed to analyze intonations from Japanese speech signal. Two parameters were presented to reveal different feature patterns: Peak spectrum (F max) and Fractal dimension (FD) trajectories. The F max and FD were computed by using short-time Fourier transform (STFT) and Higuchi's method, respectively. Speech data recorded from 15 Japanese utterances, 4 different ways of expression (accosting, wholehearted, normal, and uninterested). The results showed that the proposed features could extract different intonations statistically in comparison with baseline intonation.

[1]  Klaus R. Scherer,et al.  Vocal communication of emotion: A review of research paradigms , 2003, Speech Commun..

[2]  K. Watanabe,et al.  Thai speech assessment based on fractal theory , 2012, 2012 9th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology.

[3]  Yongzhao Zhan,et al.  Extraction and analysis of the speech emotion features based on multi-fractal spectrum , 2010, International journal of computer application and technology.

[4]  J. G. Taylor,et al.  Emotion recognition in human-computer interaction , 2005, Neural Networks.

[5]  George N. Votsis,et al.  Emotion recognition in human-computer interaction , 2001, IEEE Signal Process. Mag..

[6]  Constantine Kotropoulos,et al.  Emotional speech recognition: Resources, features, and methods , 2006, Speech Commun..

[7]  K. Scherer Vocal affect expression: a review and a model for future research. , 1986, Psychological bulletin.

[8]  T. Higuchi Approach to an irregular time series on the basis of the fractal theory , 1988 .

[9]  Montri Phothisonothai,et al.  Fractal-based EEG data analysis of body parts movement imagery tasks. , 2007, The journal of physiological sciences : JPS.

[10]  Amitava Chatterjee,et al.  Support vector machines employing cross-correlation for emotional speech recognition , 2009 .

[11]  John H. L. Hansen,et al.  Nonlinear feature based classification of speech under stress , 2001, IEEE Trans. Speech Audio Process..

[12]  Klaus R. Scherer,et al.  Speech emotion analysis , 2008, Scholarpedia.

[13]  Masahiro Nakagawa,et al.  A study of time-dependent fractal dimensions of vocal sounds , 1995 .

[14]  Sonja A. Kotz,et al.  How aging affects the recognition of emotional speech , 2008, Brain and Language.