Adaptation of multilingual stacked bottle-neck neural network structure for new language

The neural network based features became an inseparable part of state-of-the-art LVCSR systems. In order to perform well, the network has to be trained on a large amount of in-domain data. With the increasing emphasis on fast development of ASR system on limited resources, there is an effort to alleviate the need of in-domain data. To evaluate the effectiveness of other resources, we have trained the Stacked Bottle-Neck neural networks structure on multilingual data investigating several training strategies while treating the target language as the unseen one. Further, the systems were adapted to the target language by re-training. Finally, we evaluated the effect of adaptation of individual NNs in the Stacked Bottle-Neck structure to find out the optimal adaptation strategy. We have shown that the adaptation can significantly improve system performance over both, the multilingual network and network trained only on target data. The experiments were performed on Babel Year 1 data.

[1]  Georg Heigold,et al.  Multilingual acoustic models using distributed deep neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[2]  ˇ Boˇ Study of Probabilistic and Bottle-Neck Features in Multilingual Environment , 2011 .

[3]  David Talkin,et al.  A Robust Algorithm for Pitch Tracking ( RAPT ) , 2005 .

[4]  Ngoc Thang Vu,et al.  Multilingual bottle-neck features and its application for under-resourced languages , 2012, SLTU.

[5]  Andreas Stolcke,et al.  Cross-Domain and Cross-Language Portability of Acoustic Features Estimated by Multilayer Perceptrons , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[6]  Hynek Hermansky,et al.  Multilingual MLP features for low-resource LVCSR systems , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[7]  Lukás Burget,et al.  Investigation into bottle-neck features for meeting speech recognition , 2009, INTERSPEECH.

[8]  Martin Karafiát,et al.  Study of probabilistic and Bottle-Neck features in multilingual environment , 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding.

[9]  Martin Karafiát,et al.  Convolutive Bottleneck Network features for LVCSR , 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding.

[10]  Pietro Laface,et al.  On the use of a multilingual neural network front-end , 2008, INTERSPEECH.

[11]  Jan Cernocký,et al.  BUT BABEL system for spontaneous Cantonese , 2013, INTERSPEECH.

[12]  Ngoc Thang Vu,et al.  An Investigation on Initialization Schemes for Multilayer Perceptron Training Using Multilingual Dat , 2012 .

[13]  Kenneth Ward Church,et al.  Deep neural network features and semi-supervised training for low resource speech recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[14]  Martin Karafiát,et al.  The language-independent bottleneck features , 2012, 2012 IEEE Spoken Language Technology Workshop (SLT).