Building a Task-oriented Dialog System for Languages with no Training Data: the Case for Basque

This paper presents an approach for developing a task-oriented dialog system for less-resourced languages in scenarios where training data is not available. Both intent classification and slot filling are tackled. We project the existing annotations in rich-resource languages by means of Neural Machine Translation (NMT) and posterior word alignments. We then compare training on the projected monolingual data with direct model transfer alternatives. Intent Classifiers and slot filling sequence taggers are implemented using a BiLSTM architecture or by fine-tuning BERT transformer models. Models learnt exclusively from Basque projected data provide better accuracies for slot filling. Combining Basque projected train data with rich-resource languages data outperforms consistently models trained solely on projected data for intent classification. At any rate, we achieve competitive performance in both tasks, with accuracies of 81% for intent classification and 77% for slot filling.

[1]  Christof Monz,et al.  What does Attention in Neural Machine Translation Pay Attention to? , 2017, IJCNLP.

[2]  Jörg Tiedemann,et al.  Treebank Translation for Cross-Lingual Parser Induction , 2014, CoNLL.

[3]  Ralf Steinberger,et al.  Building a Multilingual Named Entity-Annotated Corpus Using Annotation Projection , 2011, RANLP.

[4]  Dilek Z. Hakkani-Tür,et al.  Spoken language understanding , 2008, IEEE Signal Processing Magazine.

[5]  Stephen D. Mayhew,et al.  Cheap Translation for Cross-Lingual Named Entity Recognition , 2017, EMNLP.

[6]  Yoshua Bengio,et al.  Investigation of recurrent-neural-network architectures and learning methods for spoken language understanding , 2013, INTERSPEECH.

[7]  Ruohui Wang,et al.  Edge Detection Using Convolutional Neural Network , 2016, ISNN.

[8]  Gökhan Tür,et al.  What is left to be understood in ATIS? , 2010, 2010 IEEE Spoken Language Technology Workshop.

[9]  Chih-Li Huo,et al.  Slot-Gated Modeling for Joint Slot Filling and Intent Prediction , 2018, NAACL.

[10]  Matthew Henderson,et al.  The Second Dialog State Tracking Challenge , 2014, SIGDIAL Conference.

[11]  Gokhan Tur,et al.  Spoken Language Understanding: Systems for Extracting Semantic Information from Speech , 2011 .

[12]  Alex Acero,et al.  Speech utterance classification , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[13]  Wei Xu,et al.  Bidirectional LSTM-CRF Models for Sequence Tagging , 2015, ArXiv.

[14]  Ruhi Sarikaya,et al.  Convolutional neural network based triangular CRF for joint intent detection and slot filling , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.

[15]  Jian Ni,et al.  Weakly Supervised Cross-Lingual Named Entity Recognition via Effective Annotation and Representation Projection , 2017, ACL.

[16]  Jimmy J. Lin,et al.  Rethinking Complex Neural Network Architectures for Document Classification , 2019, NAACL.

[17]  Roland Vollgraf,et al.  Contextual String Embeddings for Sequence Labeling , 2018, COLING.

[18]  Sebastian Schuster,et al.  Cross-lingual Transfer Learning for Multilingual Task Oriented Dialog , 2018, NAACL.

[19]  Giuseppe Riccardi,et al.  Generative and discriminative algorithms for spoken language understanding , 2007, INTERSPEECH.

[20]  Homa B. Hashemi,et al.  Query Intent Detection using Convolutional Neural Networks , 2016 .

[21]  Jian Ni,et al.  Improving Multilingual Named Entity Recognition with Wikipedia Entity Type Mapping , 2016, EMNLP.

[22]  David Yarowsky,et al.  Inducing Multilingual Text Analysis Tools via Robust Projection across Aligned Corpora , 2001, HLT.

[23]  Noah A. Smith,et al.  A Simple, Fast, and Effective Reparameterization of IBM Model 2 , 2013, NAACL.

[24]  Alex Acero,et al.  Semantic Frame‐Based Spoken Language Understanding , 2011 .

[25]  Kristina Toutanova,et al.  Multilingual Named Entity Recognition using Parallel Data and Metadata from Wikipedia , 2012, ACL.

[26]  Karin M. Verspoor,et al.  What Can We Get From 1000 Tokens? A Case Study of Multilingual POS Tagging For Resource-Poor Languages , 2014, EMNLP.

[27]  Philipp Koehn,et al.  Six Challenges for Neural Machine Translation , 2017, NMT@ACL.

[28]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[29]  Bing Liu,et al.  Attention-Based Recurrent Neural Network Models for Joint Intent Detection and Slot Filling , 2016, INTERSPEECH.

[30]  Slav Petrov,et al.  Multi-Source Transfer of Delexicalized Dependency Parsers , 2011, EMNLP.

[31]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[32]  Bhuvana Ramabhadran,et al.  Deep belief nets for natural language call-routing , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[33]  Eduard H. Hovy,et al.  End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF , 2016, ACL.

[34]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[35]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.