Segmental duration control by time delay neural networks with asymmetric causal and retro-causal information flows

The generation of pleasant prosody parameters is very important for speech synthesis A Prosody generation unit can be seen as a dynamical system In this paper sophisticated time delay recurrent neural network NN topologies are presented which can be used for the modeling of dynamical systems Within the prosody prediction task left and right context information is known to in uence the prediction of prosody control parameters This can be modeled by causal retro causal information ows Since information being available during training is partially unavailable during application there is a structural switching from training to application This structural change of the information ow is handled by two asymmetric architectures These proposed new architectures allow the integration of further a priori knowledge By this we are able to improve the performance of our duration control unit within our text to speech TTS system Papageno Introduction Our acoustic prosody module consists of a duration control and a f contour unit Both are modeled by NN see There are also rule based duration control methods which depending on rules modi es the duration of a seg ment by a multiplicative or additive scaling factor Appropriate segmental durations are very important for a natural sounding synthetic voice A du ration control module with low performance has a very strong impact on the f contour unit Similar to the f contour prediction task the duration con trol unit uses left past and right future contextual information to establish the prediction The left contextual information is the text being already read while the right contextual information is given by the text to read next A segmental duration module has to control the rhythm of a synthetic voice and the known e ect of nal lengthening So local and global structures have to be mapped The state of the art causal retrocausal modeling was presented in ESANN'2002 proceedings European Symposium on Artificial Neural Networks Bruges (Belgium), 24-26 April 2002, d-side publi., ISBN 2-930307-02-1, pp. 269-274