论文信息 - Aligning letters and phonemes for speech synthesis - 字舞流文

Aligning letters and phonemes for speech synthesis

A common requirement in speech technology is to align two different symbolic representations of the same linguistic ‘message’. For instance, we often need to align letters of words listed in a dictionary with the corresponding phonemes specifying their pronunciation. As dictionaries become ever bigger, manual alignment becomes less and less tenable yet automatic alignment is a hard problem for a language like English. In this paper, we describe use of a form of the expectation-maximization (EM) algorithm to achieve automatic alignment of English text and phonemes. The quality of alignment is assessed by the performance of a pronunciation by analogy system using the aligned dictionary data. We find excellent performance—the best so far reported in the literature of letter-phoneme conversion—independent of the start point for alignment, indicating that the EM search space is strongly convex.

Robert I. Damper | Yannick Marchand | Alex I. Bazin | John-David Marseters | A. Bazin | R. Damper | Y. Marchand | John-David Marseters

[1] Terrence J. Sejnowski,et al. Parallel Networks that Learn to Pronounce English Text , 1987, Complex Syst..

[2] R. Damper,et al. Pronunciation by Analogy: Impact of Implementational Choices on Performance , 1997 .

[3] S. G. C. Lawrence,et al. Alignment of phonemes with their corresponding orthography , 1986 .

[4] Howard C. Nusbaum,et al. Pronounce : a program for pronunciation by analogy , 1991 .

[5] Robert I. Damper,et al. A multistrategy approach to improving pronunciation by analogy , 2000, CL.

[6] Mark Bedworth,et al. NETspeak — A re-implementation of NETtalk , 1987 .

[7] François Yvon. Prononcer par analogie : motivation, formalisation et evaluation , 1996 .

[8] MarchandYannick,et al. A multistrategy approach to improving pronunciation by analogy , 2000 .

[9] Max Coltheart. Writing Systems and Reading Disorders , 1984 .

[10] Robert I. Damper,et al. Evaluating the pronunciation component of text-to-speech systems for English: a performance comparison of different approaches , 1999, Comput. Speech Lang..

[11] Vito Pirrelli,et al. "you'd Better Say Nothing than Say Something Wrong": Analogy, Accuracy and Text-to-speech Applications , 1995, EUROSPEECH.

[12] New York Dover,et al. ON THE CONVERGENCE PROPERTIES OF THE EM ALGORITHM , 1983 .

[13] R. Bellman. Dynamic programming. , 1957, Science.

[14] Peter N. Yianilos,et al. Learning String-Edit Distance , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[15] R. Venezky. The Structure of English Orthography , 1965 .

[16] Edward Carney,et al. A Survey of English Spelling , 1993 .

[17] Kirk P. H. Sullivan. Analogy, the Corpus and Pronunciation , 2001 .

[18] Alan W. Black,et al. Issues in building general letter to sound rules , 1998, SSW.

[19] S. B. Needleman,et al. A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[20] G. McLachlan,et al. The EM algorithm and extensions , 1996 .

[21] Robert I. Damper,et al. Inference of letter-phoneme correspondences by delimiting and dynamic time warping techniques , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[22] Robert I. Damper,et al. Novel-word pronunciation: A cross-language study , 1993, Speech Commun..

[23] M. Coltheart. Lexical access in simple reading tasks , 1978 .

[24] Geir Gunnarsson. Data Driven Methods in Speech Synthesis , 2005 .