Multi-lingual Phoneme Recognition and Language Identification Using Phonotactic Information

Previous research indicates that automatic language identification systems based on phonotactic information produce the best results compared with other systems based on acoustic or prosodic information. This paper investigates two different approaches that use phonotactic information: parallel phoneme recognition followed by language modeling (PPRLM) and multi-lingual PRLM. In the PPRLM approach, we have modified the system by using four different language models with different discounting methods, including the linear, absolute, good-turning and Witten-Bell. Our results show that the modified PPRLM system with the Witten-Bell discounting outperforms other systems and achieves 75.5% language identification accuracy for the OGI-TS speech corpus