A pronunciation-by-analogy module for the Festival Text-to-Speech Synthesiser

Pronunciation by analogy (PbA) is a data-driven technique for the automatic phonemisation of text which is receiving renewed attention from workers in text-to-speech synthesis. It uses the dictionary which provides the primary source of pronunciations via direct look-up as a secondary source of information about the pronunciation of unknown words. In this paper, we provide theoretical and empirical motivations for the use of PbA, review approaches to automatic pronunciation generation by analogy, and report on the implementation of a PbA module for the Festival text-to-speech synthesiser. We have used a much larger dictionary (British English Example Pronunciation or BEEP, approximately 200,000 words) than hitherto. New results of 86.7% words correct are obtained for this dictionary on our best-performing PbA implementation. The Festival PbA module is still under development, however, and currently does less well.

[1]  Robert I. Damper,et al.  Improving pronunciation by analogy for text-to-speech applications , 1998, SSW.

[2]  François Yvon Paradigmatic cascades: a linguistically sound model of pronunciation by analogy , 1997 .

[3]  R. Glushko The Organization and Activation of Orthographic Knowledge in Reading Aloud. , 1979 .

[4]  A.P.J. van den Bosch,et al.  Learning to pronounce written words : a study in inductive language learning , 1997 .

[5]  Howard C. Nusbaum,et al.  Pronounce : a program for pronunciation by analogy , 1991 .

[6]  Richard Sproat Multilingual Text-to-Speech Synthesis , 1997 .

[7]  Robert I. Damper Learning about speech from data: Beyond NETtalk , 2001 .

[8]  Terrence J. Sejnowski,et al.  Parallel Networks that Learn to Pronounce English Text , 1987, Complex Syst..

[9]  Sholom M. Weiss,et al.  Computer Systems That Learn , 1990 .

[10]  Robert I. Damper,et al.  Pronouncing Text by Analogy , 1996, COLING.

[11]  Alan W. Black,et al.  Issues in building general letter to sound rules , 1998, SSW.

[12]  François Yvon Grapheme-to-Phoneme Conversion using Multiple Unbounded Overlapping Chunks , 1996, ArXiv.

[13]  Rodney W. Johnson,et al.  Letter-to-sound rules for automatic translation of english text to phonetics , 1976 .

[14]  David H. Wolpert,et al.  No free lunch theorems for optimization , 1997, IEEE Trans. Evol. Comput..

[15]  Robert I. Damper,et al.  Novel-word pronunciation: A cross-language study , 1993, Speech Commun..

[16]  D. Wolpert,et al.  No Free Lunch Theorems for Search , 1995 .

[17]  MarchandYannick,et al.  A multistrategy approach to improving pronunciation by analogy , 2000 .

[18]  Richard Sproat,et al.  Multilingual Text-to-Speech Synthesis: The Bell Labs Approach , 1998, CL.

[19]  R. Damper,et al.  Pronunciation by Analogy: Impact of Implementational Choices on Performance , 1997 .

[20]  Mark Bedworth,et al.  NETspeak — A re-implementation of NETtalk , 1987 .

[21]  Robert I. Damper,et al.  A novel approach to inferring letter-phoneme correspondences , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[22]  T. Moon The expectation-maximization algorithm , 1996, IEEE Signal Process. Mag..

[23]  Robert I. Damper,et al.  Evaluating the pronunciation component of text-to-speech systems for English: a performance comparison of different approaches , 1999, Comput. Speech Lang..

[24]  Vito Pirrelli,et al.  "you'd Better Say Nothing than Say Something Wrong": Analogy, Accuracy and Text-to-speech Applications , 1995, EUROSPEECH.

[25]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .