Corpus-based generation of fundamental frequency contours using generation process model and considering emotional focuses

We formerly conducted emotional speech synthesis using our corpus-based method of generating fundamental frequency (F0) contours from text. The method predicts command values of F0 contour generation process model instead of directly predicting F0 value of each time frame. A better control of F0 contours was realized by taking the emotional level of each bunsetsu into account: adding information on which bunsetsu(s) the emotion is especially placed to the command predictor inputs. In the case of anger, F0 contours closer to the target contours are obtained by adding emotional levels. Speech synthesis was conducted by generating F0 contours in two ways: using commands predicted by taking emotional levels into account and those not. The result of perceptual experiment indicated that emotion was conveyed well by adding emotional levels. Index Terms: speech synthesis, emotion, F0 contour