Letter-to-phoneme conversion by inference of rewriting rules

Phonetization is a crucial step for oral document processing. In this paper, a new letter-to-phoneme conversion approach is pro- posed; it is automatic, simple, portable and efficient. It relies on a machine learning technique initially developed for translit- eration and translation; the system infers rewriting rules from examples of words with their phonetic representations. This approach is evaluated in the framework of the Pronalsyl Pas- cal challenge, which includes several datasets on different lan- guages. The obtained results equal or outperform those of the best known systems. Moreover, thanks to the simplicity of our technique, the inference time of our approach is much lower than those of the best performing state-of-the-art systems. Index Terms : phonetization, inference of rewriting rules, phonemization, grapheme-to-phoneme, Pronalsyl Challenge.

[1]  Walter Daelemans,et al.  Do Not Forget: Full Memory in Memory-Based Learning of Word Pronunciation , 1998, CoNLL.

[2]  Antal van den Bosch,et al.  Improved morpho-phonological sequence processing with constraint satisfaction inference , 2006, SIGMORPHON.

[3]  Michael J. Fischer,et al.  The String-to-String Correction Problem , 1974, JACM.

[4]  Grzegorz Kondrak,et al.  Joint Processing and Discriminative Training for Letter-to-Phoneme Conversion , 2008, ACL.

[5]  Hermann Ney,et al.  Investigations on joint-multigram models for grapheme-to-phoneme conversion , 2002, INTERSPEECH.

[6]  Vincent Claveau Translation of Biomedical Terms by Inferring Rewriting Rules , 2009 .

[7]  Walter Daelemans,et al.  Language-Independent Data-Oriented Grapheme-to-Phoneme Conversion , 1996 .

[8]  Paul Taylor,et al.  Hidden Markov models for grapheme to phoneme conversion , 2005, INTERSPEECH.

[9]  François Yvon Prononcer par analogie : motivation, formalisation et evaluation , 1996 .

[10]  Vera Demberg,et al.  Phonological Constraints and Morphological Preprocessing for Grapheme-to-Phoneme Conversion , 2007, ACL.

[11]  F. Béchet LIA―PHON: Un système complet de phonétisation de textes , 2001 .

[12]  Grzegorz Kondrak,et al.  Applying Many-to-Many Alignments and Hidden Markov Models to Letter-to-Phoneme Conversion , 2007, NAACL.

[13]  Robert I. Damper,et al.  A multistrategy approach to improving pronunciation by analogy , 2000, CL.

[14]  Christophe d'Alessandro,et al.  Objective evaluation of grapheme to phoneme conversion for text-to-speech synthesis in French , 1998, Comput. Speech Lang..

[15]  MarchandYannick,et al.  Can syllabification improve pronunciation by analogy of English , 2007 .

[16]  Alan W. Black,et al.  Issues in building general letter to sound rules , 1998, SSW.

[17]  Mathieu Roche,et al.  Information Retrieval in Biomedicine - Natural Language Processing for Knowledge Integration , 2009, Information Retrieval in Biomedicine.

[18]  J.-D. S. Marsters,et al.  Aligning Text and Phonemes for Speech Technology Applications Using an EM-Like Algorithm , 1997 .