Acoustic modelling for speech recognition in Indian languages in an agricultural commodities task domain

In developing speech recognition based services for any task domain, it is necessary to account for the support of an increasing number of languages over the life of the service. This paper considers a small vocabulary speech recognition task in multiple Indian languages. To configure a multi-lingual system in this task domain, an experimental study is presented using data from two linguistically similar languages - Hindi and Marathi. We do so by training a subspace Gaussian mixture model (SGMM) (Povey et al., 2011; Rose et al., 2011) under a multi-lingual scenario (Burget et al., 2010; Mohan et al., 2012a). Speech data was collected from the targeted user population to develop spoken dialogue systems in an agricultural commodities task domain for this experimental study. It is well known that acoustic, channel and environmental mismatch between data sets from multiple languages is an issue while building multi-lingual systems of this nature. As a result, we use a cross-corpus acoustic normalization procedure which is a variant of speaker adaptive training (SAT) (Mohan et al., 2012a). The resulting multi-lingual system provides the best speech recognition performance for both languages. Further, the effect of sharing ''similar'' context-dependent states from the Marathi language on the Hindi speech recognition performance is presented.

[1]  Tanja Schultz,et al.  Multilingual Speech Processing , 2006 .

[2]  Sanjeev Khudanpur,et al.  Pronunciation ambiguity vs. pronunciation variability in speech recognition , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[3]  Udhyakumar Nallasamy,et al.  Speech interfaces for equitable access to information technology , 2007 .

[4]  B. Bowonder,et al.  Developing a Rural Market e-hub The case study of e-Choupal experience of ITC , 2003 .

[5]  Alex Acero,et al.  Separating Speaker and Environmental Variability Using Factored Transforms , 2011, INTERSPEECH.

[6]  Kishore Prahallad,et al.  A speech-based conversation system for accessing agriculture commodity prices in Indian languages , 2011, 2011 Joint Workshop on Hands-free Speech Communication and Microphone Arrays.

[7]  Kai Feng,et al.  Multilingual acoustic modeling for speech recognition based on subspace Gaussian Mixture Models , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[8]  Stephen Cox,et al.  Some statistical issues in the comparison of speech recognition algorithms , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[9]  Ngoc Thang Vu,et al.  Cross-language bootstrapping based on completely unsupervised training using multilingual A-stabil , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[10]  Dilek Z. Hakkani-Tür,et al.  Bootstrapping Language Models for Spoken Dialog Systems From The World Wide Web , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[11]  Tanja Schultz,et al.  Grapheme based speech recognition , 2003, INTERSPEECH.

[12]  Srinivasan Umesh,et al.  Subspace based for Indian languages , 2012, 2012 11th International Conference on Information Science, Signal Processing and their Applications (ISSPA).

[13]  Yun Tang,et al.  An investigation of subspace modeling for phonetic and speaker variability in automatic speech recognition , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[14]  Tanja Schultz,et al.  Language-independent and language-adaptive acoustic modeling for speech recognition , 2001, Speech Commun..

[15]  Steve J. Young,et al.  Bootstrapping language models for dialogue systems , 2006, INTERSPEECH.

[16]  Richard M. Stern,et al.  Automatic clustering and generation of contextual questions for tied states in hidden Markov models , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[17]  Simon King,et al.  IEEE Workshop on automatic speech recognition and understanding , 2009 .

[18]  Mark J. F. Gales,et al.  Multiple-cluster adaptive training schemes , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[19]  Steve Young,et al.  The HTK book version 3.4 , 2006 .

[20]  Daniel Povey,et al.  A Tutorial-style Introduction to Subspace Gaussian Mixture Models for Speech Recognition , 2009 .

[21]  Tapan S. Parikh,et al.  Avaaj Otalo: a field study of an interactive voice forum for small farmers in rural India , 2010, CHI.

[22]  Richard C. Rose,et al.  Dealing with acoustic mismatch for training multilingual subspace Gaussian mixture models for speech recognition , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[23]  Colin P. Masica The Indo-Aryan Languages , 1991 .

[24]  Raj Reddy,et al.  Automatic Speech Recognition: The Development of the Sphinx Recognition System , 1988 .

[25]  India. Central Hindi Directorate Devanagari : development, amplification and standardisation , 1977 .

[26]  Malcolm D. Hyman,et al.  Linguistic Issues in Encoding Sanskrit , 2012 .

[27]  Kai Feng,et al.  The subspace Gaussian mixture model - A structured model for speech recognition , 2011, Comput. Speech Lang..

[28]  Liang Lu,et al.  Maximum a posteriori adaptation of subspace Gaussian mixture models for cross-lingual speech recognition , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[29]  Liang Lu,et al.  Regularized subspace Gaussian mixture models for cross-lingual speech recognition , 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding.

[30]  Mark J. F. Gales Acoustic factorisation , 2001, IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01..

[31]  Mark J. F. Gales,et al.  Maximum likelihood linear transformations for HMM-based speech recognition , 1998, Comput. Speech Lang..

[32]  Jia Liu,et al.  State-Level Data Borrowing for Low-Resource Speech Recognition Based on Subspace GMMs , 2011, INTERSPEECH.

[33]  Ratnadeep R. Deshmukh,et al.  Indian Language Speech Database: A Review , 2012 .

[34]  Raj Reddy,et al.  Automatic Speech Recognition: The Development of the SPHINX System , 2013 .