Symbolic Prosody Modeling by Causal Retro-causal NNs with Variable Context Length

In this paper the application of causal retro-causal neural networks (NN) to accent label prediction for speech synthesis is presented. Within the proposed NN architecture gating clusters are applied enabeling the dynamic adaptation of a network structure depending on the actual input to the NN. In the proposed causal retro-causal NN, gating clusters are used to adapt the network structure such that the network has a variable context length. This way only available input feature vectors from the actual context window are treated. The proposed NN architecture has been successfully applied for accent label prediction within our text-to-speech (TTS) system. Prediction accuracy ranges at 83%. This result ranges higher than results achieved with tree-based (CART) methods on a corpus with similar complexity.

[1]  J. Nazuno Haykin, Simon. Neural networks: A comprehensive foundation, Prentice Hall, Inc. Segunda Edición, 1999 , 2000 .

[2]  Julia Hirschberg,et al.  Pitch Accent in Context: Predicting Intonational Prominence from Text , 1993, Artif. Intell..

[3]  Ralph Neuneier,et al.  Robust generation of symbolic prosody by a neural classifier based on autoassociators , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[4]  Mari Ostendorf,et al.  Prediction of abstract prosodic labels for speech synthesis , 1996, Comput. Speech Lang..

[5]  Elmar Nöth,et al.  Automatic annotation and classification of phrase accents in spontaneous speech , 1999, EUROSPEECH.

[6]  Martin Holzapfel,et al.  Optimization of a neural network for speaker and task dependent F/sub 0/-generation , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[7]  Maria Wolters,et al.  Prediction of word prominence , 1997, EUROSPEECH.