论文信息 - Using language adaptive deep neural networks for improved multilingual speech recognition

Using language adaptive deep neural networks for improved multilingual speech recognition

Building Large Vocabulary Continuous Speech Recognition (LVCSR) systems for under-resourced languages is a challenging task. While plenty of data is available for English, many other languages suffer from a lack of data. There are different methods for tackling this challenge. One possibility is to use data from different languages to boost the performance of a system for a particular target language. With the emerging of LVCSR systems using neural networks (NNs), many research groups have demonstrated the benefits from using additional data in order to improve the system performance. In this work, we propose a method for providing the language information directly to the network, thus enabling it to become language adaptive. We demonstrate the effectiveness of our approach in a series of experiments.

Markus Müller | Alex Waibel | A. Waibel | Markus Müller

[1] L. Bottou. Stochastic Gradient Learning in Neural Networks , 1991 .

[2] Khe Chai Sim,et al. An investigation of augmenting speaker representations to improve speaker normalisation for DNN-based speech recognition , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[3] Sebastian Stüker,et al. Multilingual shifting deep bottleneck features for low-resource ASR , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[4] George Saon,et al. Speaker adaptation of neural network acoustic models using i-vectors , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.

[5] Keiko Horiguchi,et al. Towards Spontaneous Speech Translation , 1994 .

[6] Steve Renals,et al. Multilingual training of deep neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[7] Steve Renals,et al. Unsupervised cross-lingual knowledge transfer in DNN-based LVCSR , 2012, 2012 IEEE Spoken Language Technology Workshop (SLT).

[8] Sebastian Stüker,et al. Training time reduction and performance improvements from multilingual techniques on the BABEL ASR task , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[9] Florian Metze,et al. Distance-aware DNNs for robust speech recognition , 2015, INTERSPEECH.

[10] Ngoc Thang Vu,et al. Initialization Schemes for Multilayer Perceptron Training and their Impact on ASR Performance using Multilingual Data , 2012, INTERSPEECH.

[11] Florian Metze,et al. Extracting deep bottleneck features using stacked auto-encoders , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[12] Yee Whye Teh,et al. A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[13] Pietro Laface,et al. On the use of a multilingual neural network front-end , 2008, INTERSPEECH.

[14] Martin Karafiát,et al. The language-independent bottleneck features , 2012, 2012 IEEE Spoken Language Technology Workshop (SLT).

[15] Roberto Gretter. Euronews: a multilingual benchmark for ASR and LID , 2014, INTERSPEECH.

[16] Mattias Heldner,et al. The fundamental frequency variation spectrum , 2008 .

[17] Georg Heigold,et al. Multilingual acoustic models using distributed deep neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[18] Marc Schröder,et al. The German Text-to-Speech Synthesis System MARY: A Tool for Research, Development and Teaching , 2003, Int. J. Speech Technol..

[19] A. Waibel,et al. A one-pass decoder based on polymorphic linguistic context assignment , 2001, IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01..

[20] Florian Metze,et al. Models of tone for tonal and non-tonal languages , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.

[21] Finn Dag Buø,et al. JANUS 93: towards spontaneous speech translation , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[22] Rich Caruana,et al. Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[23] Razvan Pascanu,et al. Theano: new features and speed improvements , 2012, ArXiv.