论文信息 - Learning to Say It Well: Reranking Realizations by Predicted Synthesis Quality

Learning to Say It Well: Reranking Realizations by Predicted Synthesis Quality

This paper presents a method for adapting a language generator to the strengths and weaknesses of a synthetic voice, thereby improving the naturalness of synthetic speech in a spoken language dialogue system. The method trains a discriminative reranker to select paraphrases that are predicted to sound natural when synthesized. The ranker is trained on realizer and synthesizer features in supervised fashion, using human judgements of synthetic voice quality on a sample of the paraphrases representative of the generator's capability. Results from a cross-validation study indicate that discriminative paraphrase reranking can achieve substantial improvements in naturalness on average, ameliorating the problem of highly variable synthesis quality typically encountered with today's unit selection synthesizers.

Michael White | Crystal Nakatsu

[1] Thorsten Joachims,et al. Optimizing search engines using clickthrough data , 2002, KDD.

[2] Chris Mellish,et al. On the use of automatically generated discourse-level information in a concept-to-speech synthesis system , 1998, ICSLP.

[3] Michael White,et al. Efficient Realization of Coordinate Structures in Combinatory Categorial Grammar , 2006 .

[4] Michael White. CCG Chart Realization from Disjunctive Inputs , 2006, INLG.

[5] Michael Collins,et al. Discriminative Reranking for Natural Language Parsing , 2000, CL.

[6] Kevin Knight,et al. Generation that Exploits Corpus-Based Statistical Knowledge , 1998, ACL.

[7] Alan W. Black,et al. Limited domain synthesis , 2000, INTERSPEECH.

[8] Alan W. Black,et al. Optimal data selection for unit selection synthesis , 2001, SSW.

[9] Simon King,et al. Festival 2 - build your own general purpose unit selection speech synthesiser , 2004, SSW.

[10] Yoram Singer,et al. An Efficient Boosting Algorithm for Combining Preferences by , 2013 .

[11] Julia Hirschberg,et al. Exploring features from natural language generation for prosody modeling , 2002, Comput. Speech Lang..