Control of prosodic focus in corpus-based generation of fundamental frequency contours of Japanese based on the generation process model

A total corpus-based process of generating prosodic features from text is developed. The process first predicts pauses and phone durations, and then generates F<inf>0</inf> contours. Since F<inf>0</inf> contour generation is based on the generation process model, it is rather easy to manipulate the generated F<inf>0</inf> contours in command level. A method was developed for generating sentence F<inf>0</inf> contours, when a focus is placed in one of the “bunsetsu” of an utterance. The method is to predict differences in the F<inf>0</inf> model commands between with and without focus utterances, and apply them to the F<inf>0</inf> model commands predicted beforehand by the baseline method. The validity of the method was proved by the experiment on F<inf>0</inf> contour generation and speech synthesis.