Evaluating Prosodic Processing for Incremental Speech Synthesis

Incremental speech synthesis (iSS) accepts input and produces output in consecutive chunks that only together result in a full utterance. Systems that use iSS thus have the ability to adapt their utterances while they are ongoing. However, starting to process with less than the full utterance available prohibits global optimization, leading to potentially suboptimal solutions. In this paper, we present a method for incrementalizing the symbolic pre-processing component of speech synthesis and assess the influence of varying “lookahead”, i. e. knowledge about the rest of the utterance, on prosodic quality. We found that high quality incremental output can be achieved even with a lookahead of less than one phrase, allowing for timely system reaction.

[1]  Hugo Quené On the just‐noticeable difference for tempo in speech , 2004 .

[2]  Marc Schröder,et al.  The German Text-to-Speech Synthesis System MARY: A Tool for Research, Development and Teaching , 2003, Int. J. Speech Technol..

[3]  Paul Taylor,et al.  Text-to-Speech Synthesis , 2009 .

[4]  S. Nooteboom,et al.  THE PROSODY OF SPEECH: MELODY AND RHYTHM , 2001 .

[5]  David Schlangen,et al.  Collaborating on Utterances with a Spoken Dialogue System Using an ISU-based Approach to Incremental Dialogue Management , 2010, SIGDIAL Conference.

[6]  Gabriel Skantze,et al.  Incremental Dialogue Processing in a Micro-Domain , 2009, EACL.

[7]  David Schlangen,et al.  INPRO_iSS: A Component for Just-In-Time Incremental Speech Synthesis , 2012, ACL.

[8]  David Schlangen,et al.  Evaluation and Optimisation of Incremental Processors , 2011, Dialogue Discourse.

[9]  Gabriel Skantze,et al.  A General, Abstract Model of Incremental Dialogue Processing , 2011 .

[10]  Stefan Kopp,et al.  Combining Incremental Language Generation and Incremental Speech Synthesis for Adaptive Information Presentation , 2012, SIGDIAL Conference.

[11]  Gabriel Skantze,et al.  Towards Incremental Speech Generation in Dialogue Systems , 2010, SIGDIAL Conference.

[12]  David DeVault,et al.  Towards Natural Language Understanding of Partial Speech Recognition Results in Dialogue Systems , 2009, HLT-NAACL.

[13]  W. Levelt,et al.  Speaking: From Intention to Articulation , 1990 .

[14]  David Schlangen,et al.  The InproTK 2012 release , 2012, SDCTD@NAACL-HLT.

[15]  Jason D. Williams,et al.  Stability and Accuracy in Incremental Speech Recognition , 2011, SIGDIAL Conference.

[16]  David Schlangen,et al.  Predicting the Micro-Timing of User Input for an Incremental Spoken Dialogue System that Completes a User's Ongoing Turn , 2011, SIGDIAL Conference.

[17]  David DeVault,et al.  Can I Finish? Learning When to Respond to Incremental Interpretation Results in Interactive Dialogue , 2009, SIGDIAL Conference.

[18]  Stefan Kopp,et al.  Middleware for Incremental Processing in Conversational Agents , 2010, SIGDIAL Conference.

[19]  Thierry Dutoit,et al.  PHTS FOR MAX/MSP: A STREAMING ARCHITECTURE FOR STATISTICAL PARAMETRIC SPEECH SYNTHESIS , 2011 .

[20]  David Schlangen,et al.  Joint Satisfaction of Syntactic and Pragmatic Constraints Improves Incremental Spoken Language Understanding , 2012, EACL.

[21]  Jens Edlund Incremental speech synthesis , 2008 .

[22]  David Schlangen,et al.  Assessing and Improving the Performance of Speech Recognition for Incremental Systems , 2009, NAACL.