A Model for Varying Speaking Style in TTS systems

This paper aims to enhance the performance of a TTS system by generating various speaking styles. First we describe three speaking styles (Radio News, Political Address and Conversation) and compare the prosodic features found in these authentic styles with the prosody in “neutral” speech uttered by the eLite TTS system. Differences concern about 20 prosodic characteristics (F0 span, speech rate, pauses and hesitation, primary and secondary accentuation, schwa deletion, etc.). In order to make the neutral speech similar to a typical speaking style, prosodic characteristics are implemented within the TTS system itself or during a postprocessing step. The quality of the “stylized” synthesis is evaluated by comparing it to the original style.

[1]  P. Léon Précis de phonostylistique - parole et expressivité , 1993 .

[2]  Julia Hirschberg A Corpus-Based Approach to the Study of Speaking Style , 2000 .

[3]  Anne-Catherine Simon,et al.  Les phonostyles: une description prosodique des styles de parole en français , 2010 .

[4]  Maxine Eskénazi,et al.  Trends in speaking styles research , 1993, EUROSPEECH.

[5]  A. Simon,et al.  Discrimination de styles de parole par analyse prosodique semi-automatique , 2011 .

[6]  Richard Beaufort,et al.  Synthèse vocale par sélection linguistiquement orientée d'unités non-uniformes : LiONS , 2004 .

[7]  J. Llisterri,et al.  SPEAKING STYLES IN SPEECH RESEARCH , 1992 .

[8]  Piet Mertens,et al.  The Prosogram: Semi-Automatic Transcription of Prosody Based on a Tonal Perception Model , 2004 .

[9]  A. Simon,et al.  Phonostylographe : un outil de description prosodique. Comparaison du style radiophonique et lu , 2007 .

[10]  Lukas Latacz,et al.  Unit selection synthesis using long non-uniform units and phonemic identity matching , 2007, SSW.

[11]  M. Rossi,et al.  La prosodie du français , 1999 .

[12]  Vincent Colotte,et al.  Linguistic features weighting for a text-to-speech system without prosody model , 2005, INTERSPEECH.

[13]  Jean-Philippe Goldman,et al.  Méthodologie et algorithmes pour la détection automatique des syllabes proéminentes dans les corpus de français parlé 1. Avant-propos , 2007 .

[14]  Masanobu Abe,et al.  Speaking style conversion by changing prosodic parameters and formant frequencies , 1994, ICSLP.

[15]  Merle Horne,et al.  Prosody: Theory and Experiment , 2000 .

[16]  Paul Boersma,et al.  Praat, a system for doing phonetics by computer , 2002 .

[17]  Alan W. Black Unit selection and emotional speech , 2003, INTERSPEECH.

[18]  Anne Lacheret,et al.  A methodology for the automatic detection of perceived prominent syllables in spoken French , 2007, INTERSPEECH.

[20]  P. Boersma Praat : doing phonetics by computer (version 4.4.24) , 2006 .

[21]  P. Boersma Praat : doing phonetics by computer (version 5.1.05) , 2009 .