Language identification and multilingual speech recognition using discriminatively trained acoustic models

We perform language identification experiments for four prominent South-African languages using a multilingual speech recognition system. Specifically, we show how successfully Afrikaans, English, Xhosa and Zulu may be identified using a single set of HMMs and a single recognition pass. We further demonstrate the effect of language identification-specific discriminative acoustic model training on both the per-language recognition accuracy as well as the accuracy of the language identification process. Experiments indicate that discriminative training leads to a small overall improvement in language identification accuracy while not affecting the speech recognition performance strongly. Furthermore, language identification is found to be more error prone and discriminative training less effective for code-mixed utterances, indicating that these may require special treatment within a multilingual speech recognition system.

[1]  Javier Macías Guarasa,et al.  Language identification techniques based on full recognition in an air traffic control task , 2004, INTERSPEECH.

[2]  Marc A. Zissman,et al.  Automatic language identification , 2001, Speech Commun..

[3]  Thomas Niesler,et al.  Phonetic analysis of Afrikaans, English, Xhosa and Zulu using South African speech databases , 2005 .

[4]  Steve J. Young,et al.  MMIE training of large vocabulary recognition systems , 1997, Speech Communication.

[5]  Hermann Ney,et al.  Comparison of discriminative training criteria and optimization methods for speech recognition , 2001, Speech Commun..

[6]  Steve Young,et al.  The HTK book version 3.4 , 2006 .

[7]  Thomas Niesler,et al.  Nguni and Sotho varieties of South African English - distant cousins or twins? , 2006 .

[8]  Tanja Schultz,et al.  LVCSR-based language identification , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[9]  Larry Gillick,et al.  Automatic language identification using large vocabulary continuous speech recognition , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[10]  William M. Campbell,et al.  Acoustic, phonetic, and discriminative approaches to automatic language identification , 2003, INTERSPEECH.

[11]  Shubha Kadambe,et al.  Robust spoken language identification using large vocabulary speech recognition , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[12]  Thomas Niesler,et al.  The African Speech Technology Project: An Assessment , 2004, LREC.

[13]  Man-Hung Siu,et al.  Automatic language identification using discrete hidden Markov model , 2004, INTERSPEECH.

[14]  Bruno Gas,et al.  Language detection by neural discrimination , 2004, INTERSPEECH.