A joint intelligibility evaluation of French text-to-speech synthesis systems: the EvaSy SUS/ACR campaign

The EVALDA/EvaSy project is dedicated to the evaluation of text-to-speech synthesis systems for the French language. It is subdivided into four components: evaluation of the grapheme-to-phoneme conversion module (Boula de Mare¸il et al., 2005), evaluation of prosody (Garcia et al., 2006), evaluation of intelligibility, and global evaluation of the quality of the synthesised speech. This paper reports on the key results of the intelligibility and global evaluation of the synthesised speech. It focuses on intelligibility, assessed on the basis of semantically unpredictable sentences, but a comparison with absolute category rating in terms of e.g. pleasantness and naturalness is also provided. Three diphone systems and three selection systems have been evaluated. It turns out that the most intelligible system (diphone-based) is far from being the one which obtains the best mean opinion score.

[1]  Sebastian Möller,et al.  Assessment and Prediction of Speech Quality in Telecommunications , 2000 .

[2]  Marc Brysbaert,et al.  Lexique 2 : A new French lexical database , 2004, Behavior research methods, instruments, & computers : a journal of the Psychonomic Society, Inc.

[3]  P. Mousty,et al.  Brulex: une base de donne 'es lexicales informatise 'e pour le franc?ais e 'crit et parle , 1990 .

[4]  P. Mareuil Etude linguistique appliquee a la synthese de la parole a partir du texte , 1997 .

[5]  Martine Grice,et al.  The SUS test: A method for the assessment of text-to-speech synthesis intelligibility using Semantically Unpredictable Sentences , 1996, Speech Commun..

[6]  Jean Véronis,et al.  A multilingual prosodic database , 1998, ICSLP.

[7]  Christophe d'Alessandro,et al.  Evaluating the pronunciation of proper names by four French grapheme-to-phoneme converters , 2005, INTERSPEECH.

[8]  Max Giardina,et al.  L’évaluation des SAMI (système d’apprentissage multimédia interactif) : de la théorie à la pratique , 1998 .

[9]  Christian Benoît,et al.  An intelligibility test using semantically unpredictable sentences: towards the quantification of linguistic complexity , 1990, Speech Commun..

[10]  Olivier Cappé,et al.  Synthèse de la parole à partir du texte , 1996 .

[11]  Nick Campbell,et al.  No laughing matter , 2005, INTERSPEECH.

[12]  Alexander Raake,et al.  US-based Method for Speech Reception Threshold Measurement in French , 2006, LREC.