Adapting monolingual resources for code-mixed hindi-english speech recognition

The paper presents an automatic speech recognition (ASR) system for code-mixed read speech in Hindi-English, developed upon the extrapolation of monolingual training resources. A monolingual Hindi acoustic model, mixed with code-mixed speech data has been implemented to train a neural network based speech recognition framework. The testing corpus also follows a similar structure: containing data from both monolingual and code-mixed speech. The shared phonetic transcription, captured in WX notation has been exploited to harness the commonality between the pooled phonesets of Hindi and English. The experiments have been conducted in two separate formulations of a trigram based language model 1) In the first experiment, the language model contains no out-of-vocabulary words, as the test utterances are included in the training of the language model. The word error rate in this case has been obtained to be 10.63 %. 2) In the second experiment, the testing utterances have been excluded from the training language model. The word error rate in this case has been obtained to be 41.66 %.

[1]  Ngoc Thang Vu,et al.  An Investigation of Code-Switching Attitude Dependent Language Modeling , 2013, SLSP.

[2]  Lin-Shan Lee,et al.  An integrated framework for transcribing Mandarin-English code-mixed lectures with improved acoustic and language modeling , 2010, 2010 7th International Symposium on Chinese Spoken Language Processing.

[3]  Ying Li,et al.  Code-Switch Language Model with Inversion Constraints for Mixed Language Speech Recognition , 2012, COLING.

[4]  Ying Li,et al.  Asymmetric acoustic modeling of mixed language speech , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[5]  Sanjeev Khudanpur,et al.  Parallel training of DNNs with Natural Gradient and Parameter Averaging , 2014 .

[6]  Riyaz Ahmad Bhat,et al.  IIIT-H System Submission for FIRE2014 Shared Task on Transliterated Search , 2014, FIRE.

[7]  Chung-Hsien Wu,et al.  Automatic segmentation and identification of mixed-language speech using delta-BIC and LSA-based GMMs , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  Haizhou Li,et al.  A first speech recognition system for Mandarin-English code-switch conversational speech , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[9]  Dau-Cheng Lyu,et al.  Speech Recognition on Code-Switching Among the Chinese Dialects , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[10]  Pascale Fung,et al.  Using English Phoneme Models for Chinese Speech Recognition , 1998 .

[11]  Chen-Yu Chiang,et al.  A study on Hakka and mixed Hakka-Mandarin speech recognition , 2010, 2010 7th International Symposium on Chinese Spoken Language Processing.