Exploring PPRLM performance for NIST 2005 Language Recognition Evaluation

In the language recognition area parallel phone recognition followed by language modelling (PPRLM) is one the most widespread approaches. Although all PPRLM systems are based on the same ideas, the performance achieved by such systems depends heavily on multiple design parameters that have to be defined. As part of our preparation for the 2005 NIST Language Recognition Evaluation we have explored the effect of some of these parameters. Some of them are very common in the design of PPRLM systems, such as the number of underlying phonetic recognisers, the normalisations used or the amount of training data available. Others, like the possibility of using unlabelled speech to train phonetic recognisers or changing the complexity of the phonetic recognisers are less common and provide ways to achieve slight improvements without more labelled speech

[1]  Luis A. Hernández Gómez,et al.  On the relationship between phonetic modeling precision and phonetic speaker recognition accuracy , 2005, INTERSPEECH.

[2]  Pavel Matejka,et al.  Phonotactic language identification using high quality phoneme recognition , 2005, INTERSPEECH.

[3]  Marc A. Zissman,et al.  Comparison of : Four Approaches to Automatic Language Identification of Telephone Speech , 2004 .

[4]  Alvin F. Martin,et al.  NIST 2003 language recognition evaluation , 2003, INTERSPEECH.

[5]  K. Deveney,et al.  Oregon Health & Science University. , 2004, Archives of surgery.

[6]  Asmaa El Hannani,et al.  Exploiting High-Level Information Provided by ALISP in Speaker Recognition , 2005, NOLISP.

[7]  William M. Campbell,et al.  High-level speaker verification with support vector machines , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8]  Hynek Hermansky,et al.  Segmentation of speech for speaker and language recognition , 2003, INTERSPEECH.

[9]  Jean-Luc Gauvain,et al.  Language recognition using phone latices , 2004, INTERSPEECH.

[10]  José B. Mariño,et al.  Albayzin speech database: design of the phonetic corpus , 1993, EUROSPEECH.

[11]  Douglas A. Reynolds,et al.  Language identification using Gaussian mixture model tokenization , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[12]  Yasunari Obuchi,et al.  Language identification using phonetic and prosodic HMMs with feature normalization , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..