Language identification of individual words with joint sequence models

Abstract Within a multilingual automatic speech recognition (ASR) sys-tem, knowledge of the language of origin of unknown wordscan improve pronunciation modelling accuracy. This is of par-ticular importance for ASR systems required to deal with code-switched speech or proper names of foreign origin. For wordsthat occur in the language model, but do not occur in the pro-nunciation lexicon, text-based language identification (T-LID)of a single word in isolation may be required. This is a chal-lenging task, especially for short words. We motivate for theimportance of accurate T-LID in speech processing systems andintroduce a novel way of applying Joint Sequence Models to theT-LID task. We obtain competitive results on a real-world 4-language task: for our best JSM system, an F-measure of 97:2%is obtained, compared to a F-measure of 95:2% obtained with astate-of-the-art Support Vector Machine (SVM).Index Terms: text-based language identification, joint se-quence models, multilingual speech recognition

[1]  Van Heerden,et al.  Efficient training of support vector machines and their hyperparameters , 2012 .

[2]  Jean-Pierre Martens,et al.  Improving Proper Name Recognition by Adding Automatically Learned Pronunciation Variants to the Lexicon , 2010, LREC.

[3]  Marelie H. Davel,et al.  Implications of Sepedi/English code switching for ASR systems , 2013 .

[4]  Ariadna Font Llitjós,et al.  Knowledge of language origin improves pronunciation accuracy of proper names , 2001, INTERSPEECH.

[5]  Ioan Tabus,et al.  Language identification of individualwords in a multilingual automatic speech recognition system , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[6]  Bart D'hoore,et al.  How speaker tongue and name source language affect the automatic recognition of spoken names , 2009, INTERSPEECH.

[7]  Grzegorz Kondrak,et al.  Language identification of names with SVMs , 2010, HLT-NAACL.

[8]  Etienne Barnard,et al.  A Southern African corpus for multilingual name pronunciation , 2011 .

[9]  Robert I. Damper,et al.  A comparison of letter-to-sound conversion techniques for English text-to-speech synthesis , 1998 .

[10]  Marelie H. Davel,et al.  N-gram based language identification of individual words , 2013 .

[11]  Etienne Barnard,et al.  Factors that affect the accuracy of text-based language identification , 2012, Comput. Speech Lang..

[12]  Hermann Ney,et al.  Joint-sequence models for grapheme-to-phoneme conversion , 2008, Speech Commun..

[13]  Paul Taylor,et al.  Hidden Markov models for grapheme to phoneme conversion , 2005, INTERSPEECH.