Automatic Phonetic Transcription by Phonological Derivation

Automatic phonetic transcription tools usually perform phonetic transcriptions directly from orthographic representations. Although these approaches often achieve good results, theoretical studies suggest that including morphophonological knowledge allows those systems to improve their performance. Following this idea, we developed a tool which first obtains an underlying representation of each word, using small lexica and dedicated lemmatizers. For each representation, a phonological derivation generates the phonetic transcription by applying linguistically motivated rules. Since most of these rules are added as optional parameters, the system permits to generate dialect-specific transcriptions. This system is not only a grapheme-to-phone tool, but it also obtains phonological representations and evaluates several linguistic processes occurring during the derivation. Preliminary experiments emulating a phonological system of Galician (using as input words spelled in European Portuguese) show that the underlying representation of most words can be obtained using small lexica and also that the derivation produces high-quality phonetic transcriptions.

[1]  Mª Virtudes Pardo Gómez Universidade de Santiago de Compostela , 2008 .

[2]  Luís C. Oliveira,et al.  DIXI - A Generic Text-to-Speech System for European Portuguese , 2008, PROPOR.

[3]  J. Blevins The Syllable in Phonological Theory , 1995 .

[4]  Lluís Padró,et al.  Analizadores Multilingües en FreeLing , 2011, Linguamática.

[5]  Pablo Gamallo,et al.  Análise Morfossintáctica para Português Europeu e Galego: Problemas, Soluções e Avaliação , 2010, Linguamática.

[6]  Gonzalo Iglesias,et al.  Specific features of the Galician language and implications for speech technology development , 2008, Speech Commun..

[7]  Marcos García,et al.  Conversión Fonética Automática con Información Fonológica para el Gallego , 2011, Proces. del Leng. Natural.

[8]  Maria Helena Mira Mateus,et al.  The phonology of Portuguese , 2000 .

[9]  Real Academia Gallega Normas ortográficas e morfolóxicas do idioma galego , 1993 .

[10]  A. Branco,et al.  Very high accuracy rule-based nominal lemmatization with a minimal lexicon , 2007 .

[11]  Jorge Civera Saiz,et al.  27th Conference of the Spanish Society for Natural Language Processing , 2011 .

[12]  Xosé Luís Regueira Fernández Diccionario de pronuncia da lingua galega , 2010 .

[13]  K. P. Mohanan,et al.  The Theory of Lexical Phonology , 1982 .

[14]  Thiago Alexandre Salgueiro Pardo,et al.  Computational Processing of the Portuguese Language - 11th International Conference, PROPOR 2014, São Carlos/SP, Brazil, October 6-8, 2014. Proceedings , 2014, Lecture Notes in Computer Science.

[15]  D. Braga,et al.  A rule-based grapheme-to-phone converter for tts systems in european portuguese , 2006, 2006 International Telecommunications Symposium.

[16]  Fernando Perdigão,et al.  Generating a pronunciation dictionary for European Portuguese using a joint-sequence model with embedded stress assignment , 2013, Journal of the Brazilian Computer Society.

[17]  Noam Chomsky,et al.  The Sound Pattern of English , 1968 .

[18]  José Pedro Ferreira,et al.  The Role of Morphology in Generating High-Quality Pronunciation Lexica for Regional Variants of Portuguese , 2010, PROPOR.

[19]  Carmen García-Mateo,et al.  Building High Quality Databases for Minority Languages such as Galician , 2010, LREC.

[20]  Daniela Braga,et al.  LETTER-TO-SOUND CONVERSION FOR GALICIAN TTS SYSTEMS , 2006 .

[21]  Daniela Braga,et al.  Algoritmos de conversão grafema-fonema em galego para sistemas de conversão texto-fala , 2010 .

[22]  J. Goldsmith,et al.  The handbook of phonological theory , 2011 .

[23]  José Ramom Pichel Campos,et al.  Vencendo a escassez de recursos computacionais. Carvalho: Tradutor Automático Estatístico Inglês-Galego a partir do corpus paralelo Europarl Inglês-Português , 2010, Linguamática.

[24]  António J. S. Teixeira,et al.  On european Portuguese automatic syllabification , 2005, INTERSPEECH.