Context-dependent phone models and models adaptation for phonotactic language recognition

The performance of a PPRLM language recognition system depends on the quality and the consistency of phone decoders. To improve the performance of the decoders, this paper investigates the use of context-dependent instead of contextindependent phone models, and the use of CMLLR for model adaptation. This paper also discusses several improvements to the LIMSI 2007 NIST LRE system, including the use of a 4gram language model, score calibration and fusion using the FoCalMulti-class toolkit (with large development data) and better decoding parameters such as phone insertion penalty. The improved system is evaluated on the NIST LRE-2005 and the LRE-2007 evaluation data sets. Despite its simplicity, the system achieves for the 30s condition a Cavg of 2.4% and 1.6% on these data sets, respectively.

[1]  Mark J. F. Gales,et al.  Maximum likelihood linear transformations for HMM-based speech recognition , 1998, Comput. Speech Lang..

[2]  Jirí Navrátil,et al.  Recent advances in phonotactic language recognition using binary-decision trees , 2006, INTERSPEECH.

[3]  Pavel Matejka,et al.  Phonotactic language identification using high quality phoneme recognition , 2005, INTERSPEECH.

[4]  Douglas A. Reynolds,et al.  Improving phonotactic language recognition with acoustic adaptation , 2007, INTERSPEECH.

[5]  Marc A. Zissman,et al.  Comparison of : Four Approaches to Automatic Language Identification of Telephone Speech , 2004 .

[6]  Doroteo Torre Toledano,et al.  Improved language recognition using better phonetic decoders and fusion with MFCC and SDC features , 2007, INTERSPEECH.

[7]  David A. van Leeuwen,et al.  On calibration of language recognition scores , 2006, Odyssey.

[8]  Martine Adda-Decker,et al.  Different size multilingual phone inventories and context-dependent acoustic models for language identification , 2005, INTERSPEECH.

[9]  Jean-Luc Gauvain,et al.  Language recognition using phone latices , 2004, INTERSPEECH.

[10]  Lukás Burget,et al.  Discriminative Training Techniques for Acoustic Language Identification , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[11]  N. Brummer,et al.  On calibration of language recognition scores , 2006, 2006 IEEE Odyssey - The Speaker and Language Recognition Workshop.

[12]  Vassilios Digalakis,et al.  Speaker adaptation using constrained estimation of Gaussian mixtures , 1995, IEEE Trans. Speech Audio Process..

[13]  William M. Campbell,et al.  Support vector machines for speaker and language recognition , 2006, Comput. Speech Lang..