Overcoming Data Sparsity in Acoustic Modeling of Low-Resource Language by Borrowing Data and Model Parameters from High-Resource Languages

In this paper, we propose two techniques to improve the acoustic model of a low-resource language by: (i) Pooling data from closely related languages using a phoneme mapping algorithm to build acoustic models like subspace Gaussian mixture model (SGMM), phone cluster adaptive training (Phone-CAT), deep neural network (DNN) and convolutional neural network (CNN). Using the low-resource language data, we then adapt the afore mentioned models towards that language. (ii) Using models built from high-resource languages, we first borrow subspace model parameters from SGMM/Phone-CAT; or hidden layers from DNN/CNN. The language specific parameters are then estimated using the lowresource language data. The experiments were performed on four Indian languages namely Assamese, Bengali, Hindi and Tamil. Relative improvements of 10 to 30% were obtained over corresponding monolingual models in each case.

[1]  Kai Feng,et al.  Multilingual acoustic modeling for speech recognition based on subspace Gaussian Mixture Models , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[2]  Srinivasan Umesh,et al.  Acoustic modeling using transform-based phone-cluster adaptive training , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.

[3]  William J. Byrne,et al.  Towards language independent acoustic modeling , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[4]  Laurent Besacier,et al.  First steps in fast acoustic modeling for a new target language: application to Vietnamese , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[5]  Tara N. Sainath,et al.  Deep convolutional neural networks for LVCSR , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[6]  Ngoc Thang Vu,et al.  Multilingual deep neural network based acoustic modeling for rapid language adaptation , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[7]  Jia Liu,et al.  Combination of data borrowing strategies for low-resource LVCSR , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.

[8]  Ralf Schlüter,et al.  Investigation on cross- and multilingual MLP features under matched and mismatched acoustical conditions , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[9]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.

[10]  Tanja Schultz,et al.  Experiments on cross-language acoustic modeling , 2001, INTERSPEECH.

[11]  Tanja Schultz,et al.  Automatic speech recognition for under-resourced languages: A survey , 2014, Speech Commun..

[12]  Jean-Luc Gauvain,et al.  Active learning based data selection for limited resource STT and KWS , 2015, INTERSPEECH.

[13]  Daniel Povey,et al.  The Kaldi Speech Recognition Toolkit , 2011 .

[14]  Neethu Mariam Joy,et al.  A data-driven phoneme mapping technique using interpolation vectors of phone-cluster adaptive training , 2014, 2014 IEEE Spoken Language Technology Workshop (SLT).

[15]  Brian Kingsbury,et al.  Multilingual representations for low resource speech recognition and keyword search , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).

[16]  Mark J. F. Gales,et al.  Investigation of multilingual deep neural networks for spoken term detection , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.

[17]  Liang Lu,et al.  Cross-Lingual Subspace Gaussian Mixture Models for Low-Resource Speech Recognition , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[18]  Ngoc Thang Vu,et al.  Multilingual bottle-neck features and its application for under-resourced languages , 2012, SLTU.

[19]  Partha Lal,et al.  Cross-Lingual Automatic Speech Recognition Using Tandem Features , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[20]  Haizhou Li,et al.  Robust phone set mapping using decision tree clustering for cross-lingual phone recognition , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[21]  Laurent Besacier,et al.  Automatic Speech Recognition for Under-Resourced Languages: Application to Vietnamese Language , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[22]  Yifan Gong,et al.  Cross-language knowledge transfer using multilingual deep neural network with shared hidden layers , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[23]  Kai Feng,et al.  The subspace Gaussian mixture model - A structured model for speech recognition , 2011, Comput. Speech Lang..