Development and analysis of multilingual phone recognition systems using Indian languages

In this paper, the development of Multilingual Phone Recognition System (Multi-PRS) using four Indian languages—Kannada, Telugu, Bengali, and Odia—is described. Multi-PRS is an universal Phone Recognition System (PRS), which performs the phone recognition independent of any language. International phonetic alphabets based transcription is used for grouping the acoustically similar phonetic units from multiple languages. Multilingual phone recognisers for Indian languages are studied using two broad groups namely—Dravidian languages and Indo-Aryan languages. Dravidian and Indo-Aryan languages are grouped separately to develop Bilingual PRSs. We have explored both HMMs and DNNs for developing PRSs under both context-dependent and context-independent setups. The state-of-the-art DNNs have outperformed the HMMs. The performance of Multi-PRSs is analysed and compared with that of the monolingual PRSs. The advantages of Multi-PRSs over monolingual PRSs are discussed. Further, we have developed tandem Multi-PRSs using phone posteriors as tandem features to improve the performance of the baseline Multi-PRSs. It is found that the tandem Multi-PRSs have outperformed the baseline Multi-PRSs in all the cases.

[1]  Tanja Schultz,et al.  Language-independent and language-adaptive acoustic modeling for speech recognition , 2001, Speech Commun..

[2]  Tanja Schultz,et al.  Language independent and language adaptive large vocabulary speech recognition , 1998, ICSLP.

[3]  Srinivasan Umesh,et al.  Acoustic modelling for speech recognition in Indian languages in an agricultural commodities task domain , 2014, Speech Commun..

[4]  K. Sreenivasa Rao,et al.  Development of multilingual phone recognition system for Indian languages , 2017, 2017 IEEE International Conference on Signal Processing, Informatics, Communication and Energy Systems (SPICES).

[5]  Tanja Schultz,et al.  Multilingual Speech Processing , 2006 .

[6]  S. R. Mahadeva Prasanna,et al.  Development of Assamese Phonetic Engine: Some issues , 2013, 2013 Annual IEEE India Conference (INDICON).

[7]  Hynek Hermansky,et al.  Analysis of MLP-Based Hierarchical Phoneme Posterior Probability Estimator , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  K. Sreenivasa Rao,et al.  Indian Languages ASR: A Multilingual Phone Recognition Framework with IPA Based Common Phone-set, Predicted Articulatory Features and Feature fusion , 2018, INTERSPEECH.

[9]  Bayya Yegnanarayana,et al.  Spotting Multilingual Consonant-Vowel Units of Speech Using Neural Network Models , 2005, NOLISP.

[10]  Daniel Povey,et al.  Revisiting semi-continuous hidden Markov models , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[11]  Shubham Sharma,et al.  Development of language resources for speech application in Gujarati and Marathi , 2014, 2014 International Conference on Asian Language Processing (IALP).

[12]  Dau-Cheng Lyu,et al.  Experiments on Cross-Language Attribute Detection and Phone Recognition With Minimal Target-Specific Training Data , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[13]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[14]  K. Sreenivasa Rao,et al.  Phonetic and Prosodically Rich Transcribed speech corpus in Indian languages: Bengali and Odia , 2013, 2013 International Conference Oriental COCOSDA held jointly with 2013 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE).

[15]  Lori Lamel,et al.  Multilingual phone recognition of spontaneous telephone speech , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[16]  Hervé Bourlard,et al.  Hierarchical integration of phonetic and lexical knowledge in phone posterior estimation , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[17]  Daniel P. W. Ellis,et al.  Tandem connectionist feature extraction for conventional HMM systems , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[18]  K. Sreenivasa Rao,et al.  Vowel Onset Point Detection for Low Bit Rate Coded Speech , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[19]  Haizhou Li,et al.  Multilingual speech recognition: a unified approach , 2005, INTERSPEECH.

[20]  Victor K. Golla California Indian Languages , 2011 .

[21]  K. Sreenivasa Rao,et al.  Automatic Phonetic Transcription for read, extempore and conversation speech for an Indian language: Bengali , 2014, 2014 Twentieth National Conference on Communications (NCC).

[22]  Simon King,et al.  Articulatory feature classifiers trained on 2000 hours of telephone speech , 2007, INTERSPEECH.

[23]  Tanja Schultz,et al.  Multilingual and Crosslingual Speech Recognition , 1998 .

[24]  Veena Karjigi,et al.  Development of Kannada speech corpus for prosodically guided phonetic search engine , 2013, 2013 International Conference Oriental COCOSDA held jointly with 2013 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE).

[25]  Xiaohui Zhang,et al.  Improving deep neural network acoustic models using generalized maxout networks , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).