Comparing different model configurations for language identification using a phonotactic approach

In this paper different model configurations for language identification using a phonotactic approach are explored. Identification experiments were carried out on the 11-language telephone speech corpus OGI-TS, containing calls in French, English, German, Spanish, Japanese, Korean, Mandarin, Tamil, Farsi, Hindi, and Vietnamese. Phone sequences output by one or multiple phone recognizers are rescored with language-dependent phonotactic models approximated by phone bigrams. The parameters of different sets of acoustic phone models were estimated using the 4-language IDEAL corpus. Sets of language-specific phonotactic models were trained using the training portion of the OGITS CORPUS. Error rates are significantly reduced by combining language-dependent and language-independent acoustic decoders, especially for short segments. A 9.9% LID error rate was obtained on the 11-language task using phonotactic models trained on spontaneous speech data. These results show that the phonotactic approach is relative insensitive to an acoustic mismatch between training and test conditions.