Multilingual phone clustering for recognition of spontaneous indonesian speech utilising pronunciation modelling techniques

In this paper, a multilingual acoustic model set derived from English, Hindi, and Spanish is utilised to recognise speech in Indonesian. In order to achieve this task we incorporate a two tiered approach to perform the cross-lingual porting of the multilingual models to a new language. In the first stage, we use an entropy based decision tree to merge similar phones from different languages intoclustersto forma newmultilingual model set. In the second stage, we propose the use of a cross-lingual pronunciation modelling technique to perform the mapping from the multilingual models to the Indonesian phone set. A set of mapping rules are derived from this process and are employed to convert the original Indonesian lexicon into a pronunciation lexicon in terms of the multilingual model set. Preliminary experimental results show that, compared to the common knowledge based approach, both of these techniques reduce the word error rate in a spontaneous speech recognition task.

[1]  Jean-Pierre Martens,et al.  In search of better pronunciation models for speech recognition , 1999, Speech Commun..

[2]  Ronald A. Cole,et al.  The OGI 22 language telephone speech corpus , 1995, EUROSPEECH.

[3]  Tanja Schultz,et al.  Language-independent and language-adaptive acoustic modeling for speech recognition , 2001, Speech Commun..

[4]  Terrence Martin,et al.  Cross-lingual pronunciation modelling for indonesian speech recognition , 2003, INTERSPEECH.

[5]  Alex Waibel,et al.  The GlobalPhone Project: Multilingual LVCSR with JANUS-3 , 1997 .

[6]  Andrej Zgank,et al.  Crosslingual speech recognition with multilingual acoustic models based on agglomerative and tree-based triphone clustering , 2001, INTERSPEECH.

[7]  D. Horga HANDBOOK OF THE INTERNATIONAL PHONETIC ASSOCIATION. A GUIDE TO THE USE OF THE INTERNATIONAL PHONETIC ALPHABET Cambridge: Cambridge University Press (1999), (204 stranice) , 1999 .

[8]  Frank K. Soong,et al.  Optimizing baseforms for HMM-based speech recognition , 1995, EUROSPEECH.

[9]  Jj Odell,et al.  The Use of Context in Large Vocabulary Speech Recognition , 1995 .

[10]  E. Vajda Handbook of the International Phonetic Association: A Guide to the Use of the International Phonetic Alphabet , 2000 .

[11]  Sridha Sridharan,et al.  CROSS LINGUAL MODELLING EXPERIMENTS FOR INDONESIAN , 2002 .

[12]  Joachim Köhler,et al.  Multi-lingual phoneme recognition exploiting acoustic-phonetic similarities of sounds , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.