论文信息 - Multi-lingual Phoneme Recognition and Language Identification Using Phonotactic Information

Multi-lingual Phoneme Recognition and Language Identification Using Phonotactic Information

Previous research indicates that automatic language identification systems based on phonotactic information produce the best results compared with other systems based on acoustic or prosodic information. This paper investigates two different approaches that use phonotactic information: parallel phoneme recognition followed by language modeling (PPRLM) and multi-lingual PRLM. In the PPRLM approach, we have modified the system by using four different language models with different discounting methods, including the linear, absolute, good-turning and Witten-Bell. Our results show that the modified PPRLM system with the Witten-Bell discounting outperforms other systems and achieves 75.5% language identification accuracy for the OGI-TS speech corpus

Eliathamby Ambikairajah | Eric H. C. Choi | Liang Wang

[1] Marc A. Zissman,et al. Comparison of : Four Approaches to Automatic Language Identification of Telephone Speech , 2004 .

[2] Ian H. Witten,et al. The zero-frequency problem: Estimating the probabilities of novel events in adaptive text compression , 1991, IEEE Trans. Inf. Theory.

[3] F ChenStanley,et al. An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[4] Eliathamby Ambikairajah,et al. Language Identification using Warping and the Shifted Delta Cepstrum , 2005, 2005 IEEE 7th Workshop on Multimedia Signal Processing.