Brno University of Technology System for NIST 2005 Language Recognition Evaluation

This paper presents the language identification (LID) system developed in Speech@FIT group at Brno University of Technology (BUT) for NIST 2005 Language Recognition Evaluation. The system consists of two parts: phonotactic and acoustic. Phonotactic system is based on hybrid phoneme recognizers trained on SpeechDat-E database. Phoneme lattices are used to train and test phonotactic language models. Further improvement is obtained by using anti-models. Acoustic system is based on GMM modeling trained under maximum mutual information framework. We describe both parts and provide a discussion of performance on LRE 2005 recognition task

[1]  Pavel Matejka,et al.  Towards Lower Error Rates in Phoneme Recognition , 2004, TSD.

[2]  Lukás Burget,et al.  Use of Anti-Models to Further Improve State-of-the-Art PRLM Language Recognition System , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[3]  Jean-Luc Gauvain,et al.  Language recognition using phone latices , 2004, INTERSPEECH.

[4]  Ian H. Witten,et al.  The zero-frequency problem: Estimating the probabilities of novel events in adaptive text compression , 1991, IEEE Trans. Inf. Theory.

[5]  William M. Campbell,et al.  Acoustic, phonetic, and discriminative approaches to automatic language identification , 2003, INTERSPEECH.

[6]  Lukás Burget,et al.  Discriminative Training Techniques for Acoustic Language Identification , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[7]  Pavel Matejka,et al.  Phonotactic language identification using high quality phoneme recognition , 2005, INTERSPEECH.

[8]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[9]  Jordan Cohen,et al.  Vocal tract normalization in speech recognition: Compensating for systematic speaker variability , 1995 .

[10]  Jonathan Le Roux,et al.  Discriminative Training for Large-Vocabulary Speech Recognition Using Minimum Classification Error , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  Pavel Matejka,et al.  Hierarchical Structures of Neural Networks for Phoneme Recognition , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[12]  Steve Young,et al.  The HTK book , 1995 .

[13]  Marc A. Zissman,et al.  Comparison of : Four Approaches to Automatic Language Identification of Telephone Speech , 2004 .

[14]  Douglas A. Reynolds,et al.  Approaches to language identification using Gaussian mixture models and shifted delta cepstral features , 2002, INTERSPEECH.

[15]  Alvin F. Martin,et al.  NIST 2003 language recognition evaluation , 2003, INTERSPEECH.