Voice and emotional expression transformation based on statistics of vowel parameters in an emotional speech database

We propose a simple method for modifying emotional speech sounds. The method aims at real-time implementation of an emotional expression transformation system based on STRAIGHT. We developed a mapping function of spectra, fundamental frequencies (F0), and vowel durations from the statistical analysis of 1500 expressive speech sounds in an emotional speech database. The spectral mapping parameters are initially extracted at the centers of vowels and interpolated with bilinear functions. The spectral frequency warping functions are manually designed. The F0 and duration mapping functions simply transform the average values in log frequency and linear time scales. We demonstrate that the spectral distortion is small enough when ‘Neutral’ speech sounds are transformed to expressive speech sounds (i.e. ‘Bright’, ‘Excited’, ‘Angry’, and ‘Raging’ speech sounds).