Prominence-Based Prosody Prediction for Unit Selection Speech Synthesis

This paper describes the development and evaluation of a prosody prediction module for unit selection speech synthesis that is based on the notion of perceptual prominence. We outline the design principles of the module and describe its implementation in the Bonn Open Synthesis System (BOSS). Moreover, we report results of perception experiments that have been conducted in order to evaluate prominence prediction. The paper is concluded by a general discussion of the approach and a sketch of perspectives for further work. Index Terms: speech synthesis, unit selection, perceptual prominence, prosody modeling, metrical phonology

[1]  Simon King,et al.  Modelling prominence and emphasis improves unit-selection synthesis , 2007, INTERSPEECH.

[2]  G. Fant,et al.  Speech , Music and Hearing Quarterly Progress and Status Report Preliminaries to the study of Swedish prose reading and reading style , 2007 .

[3]  Susanne Uhmann Fokusphonologie : eine Analyse deutscher Intonationskonturen im Rahmen der nicht-linearen Phonologie , 1991 .

[4]  Patra S. Wagner Evaluating Metrical Phonology - a Computational- Empirical Approach , 2000, KONVENS.

[5]  Petra Wagner,et al.  Speech synthesis development made easy: the bonn open synthesis system , 2001, INTERSPEECH.

[6]  Petra Wagner,et al.  The influence of top-down expectations on the perception of syllable prominence , 2008, ExLing.

[7]  Hartmut Traunmüller,et al.  Perception of syllable prominence by listeners with and without competence in the tested language , 2002 .

[8]  Barbara Heuft,et al.  Towards a prominence-based synthesis system , 1997, Speech Commun..

[9]  Mari Ostendorf,et al.  Prediction of abstract prosodic labels for speech synthesis , 1996, Comput. Speech Lang..

[10]  Wolfgang Hess,et al.  The Bonn Open Synthesis System 3 , 2010, Int. J. Speech Technol..

[11]  H. Timothy Bunnell,et al.  Crafting small databases for unit selection TTS: effects on intelligibility , 2010, SSW.

[12]  Elisabeth Dévière,et al.  Analyzing linguistic data: a practical introduction to statistics using R , 2009 .

[13]  Ann K. Syrdal,et al.  Perceptually based automatic prosody labeling and prosodically enriched unit selection improve concatenative text-to-speech synthesis , 2000, INTERSPEECH.

[14]  Petra Wagner,et al.  On automatic prominence detection for German , 2007, INTERSPEECH.

[15]  Valéria Molnár,et al.  Susanne Uhmann, Fokusphonologie. Eine Analyse deutscher Intonationskonturen im Rahmen der nicht-linearen Phonologie. Tübingen, Niemeyer 1991. , 1994 .

[16]  D. Fry Experiments in the Perception of Stress , 1958 .

[17]  Justin Fackrell,et al.  Multilingual prosody modelling using cascades of regression trees and neural networks , 1999, EUROSPEECH.

[18]  Austin F. Frank,et al.  Analyzing linguistic data: a practical introduction to statistics using R , 2010 .

[19]  Helmut Schmid,et al.  Improvements in Part-of-Speech Tagging with an Application to German , 1999 .

[20]  Petra Wagner Great expectations - introspective vs. perceptual prominence ratings and their acoustic correlates , 2005, INTERSPEECH.

[21]  B. M. Streefkerk Prominence. Acoustic and lexical/syntactic correlates , 2002 .

[22]  Maria Wolters,et al.  Prediction of word prominence , 1997, EUROSPEECH.

[23]  Petra Wagner,et al.  Automatic prominence annotation of a German speech synthesis corpus: towards prominence-based prosody generation for unit selection synthesis , 2010, SSW.