Multi-Level Cross-Lingual Transfer Learning With Language Shared and Specific Knowledge for Spoken Language Understanding

Recently conversational agents effectively improve their understanding capabilities by neural networks. Such deep neural models, however, do not apply to most human languages due to the lack of annotated training data for various NLP tasks. In this paper, we propose a multi-level cross-lingual transfer model with language shared and specific knowledge to improve the spoken language understanding of low-resource languages. Our method explicitly separates the model into the language-shared part and language-specific part to transfer cross-lingual knowledge and improve the monolingual slot tagging, especially for low-resource languages. To refine the shared knowledge, we add a language discriminator and employ adversarial training to reinforce information separation. Besides, we adopt novel multi-level knowledge transfer in an incremental and progressive way to acquire multi-granularity shared knowledge rather than a single layer. To mitigate the discrepancies between the feature distributions of language specific and shared knowledge, we propose the neural adapters to fuse knowledge automatically. Experiments show that our proposed model consistently outperforms monolingual baseline with a statistically significant margin up to 2.09%, even higher improvement of 12.21% in the zero-shot setting.

[1]  Eneko Agirre,et al.  Analyzing the Limitations of Cross-lingual Word Embedding Mappings , 2019, ACL.

[2]  Guillaume Lample,et al.  Neural Architectures for Named Entity Recognition , 2016, NAACL.

[3]  Eric Nichols,et al.  Named Entity Recognition with Bidirectional LSTM-CNNs , 2015, TACL.

[4]  Razvan Pascanu,et al.  Progressive Neural Networks , 2016, ArXiv.

[5]  Roberto Cipolla,et al.  Multi-task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[6]  Christopher D. Manning,et al.  Bilingual Word Representations with Monolingual Quality in Mind , 2015, VS@HLT-NAACL.

[7]  Sungjin Lee,et al.  Zero-Shot Adaptive Transfer for Conversational Language Understanding , 2018, AAAI.

[8]  Yang Feng,et al.  Improving Domain Adaptation Translation with Domain Invariant and Specific Information , 2019, NAACL.

[9]  Xiaodong Liu,et al.  Multi-task Learning with Sample Re-weighting for Machine Reading Comprehension , 2018, NAACL.

[10]  Eneko Agirre,et al.  A robust self-learning method for fully unsupervised cross-lingual mappings of word embeddings , 2018, ACL.

[11]  Arindam Ghosh,et al.  Language style and domain adaptation for cross-language SLU porting , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.

[12]  François Laviolette,et al.  Domain-Adversarial Training of Neural Networks , 2015, J. Mach. Learn. Res..

[13]  Xiang Zhang,et al.  Character-level Convolutional Networks for Text Classification , 2015, NIPS.

[14]  Alessandro Moschitti,et al.  Transfer Learning for Sequence Labeling Using Source Model and Target Data , 2019, AAAI.

[15]  Gökhan Tür,et al.  Multi-style adaptive training for robust cross-lingual spoken language understanding , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[16]  Encarna Segarra,et al.  Combining multiple translation systems for Spoken Language Understanding portability , 2012, 2012 IEEE Spoken Language Technology Workshop (SLT).

[17]  Franck Dernoncourt,et al.  Transfer Learning for Named-Entity Recognition with Neural Networks , 2017, LREC.

[18]  Yoshua Bengio,et al.  BilBOWA: Fast Bilingual Distributed Representations without Word Alignments , 2014, ICML.

[19]  Ruslan Salakhutdinov,et al.  Transfer Learning for Sequence Tagging with Hierarchical Recurrent Networks , 2016, ICLR.

[20]  Andrew McCallum,et al.  Lexicon Infused Phrase Embeddings for Named Entity Resolution , 2014, CoNLL.

[21]  Christof Monz,et al.  Dynamic Data Selection for Neural Machine Translation , 2017, EMNLP.

[22]  Dietrich Klakow,et al.  Cross-lingual Transfer Learning for Japanese Named Entity Recognition , 2019, NAACL.

[23]  Lantao Yu,et al.  SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient , 2016, AAAI.

[24]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[25]  Bing Liu,et al.  Attention-Based Recurrent Neural Network Models for Joint Intent Detection and Slot Filling , 2016, INTERSPEECH.

[26]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[27]  Ngoc Thang Vu Sequential Convolutional Neural Networks for Slot Filling in Spoken Language Understanding , 2016, INTERSPEECH.

[28]  Gokhan Tur,et al.  Spoken Language Understanding: Systems for Extracting Semantic Information from Speech , 2011 .

[29]  Heng Ji,et al.  A Multi-lingual Multi-task Architecture for Low-resource Sequence Labeling , 2018, ACL.

[30]  Gökhan Tür,et al.  (Almost) Zero-Shot Cross-Lingual Spoken Language Understanding , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[31]  Yoshua Bengio,et al.  Investigation of recurrent-neural-network architectures and learning methods for spoken language understanding , 2013, INTERSPEECH.

[32]  Dan Roth,et al.  Design Challenges and Misconceptions in Named Entity Recognition , 2009, CoNLL.

[33]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.