Deep Cascade Multi-task Learning for Slot Filling in Chinese E-commerce Shopping Guide Assistant

Slot filling is a critical task in natural language understanding (NLU) for dialog systems. State-of-the-art solutions regard it as a sequence label- ing task and adopt BiLSTM-CRF models. While BiLSTM-CRF models works relatively well on standard datasets it faces challenges in Chinese E-commerce slot filling due to more informative slot labels and richer expressions. In this paper, we propose a deep multi-task learning model with cascade and residual connections. Experimental results show that our framework not only achieves competitive performance with state-of-the-arts on a standard dataset, but also significantly outperforms strong baselines by a substantial gain of 14.6% on a Chinese E-commerce dataset.

[1]  Nanyun Peng,et al.  Multi-task Multi-domain Representation Learning for Sequence Tagging , 2016, ArXiv.

[2]  Ngoc Thang Vu,et al.  Bi-directional recurrent neural network with ranking loss for spoken language understanding , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[3]  C A Nelson,et al.  Learning to Learn , 2017, Encyclopedia of Machine Learning and Data Mining.

[4]  Bing Liu,et al.  Joint Online Spoken Language Understanding and Language Modeling With Recurrent Neural Networks , 2016, SIGDIAL Conference.

[5]  Bing Liu,et al.  Recurrent Neural Network Structured Output Prediction for Spoken Language Understanding , 2015 .

[6]  Ngoc Thang Vu Sequential Convolutional Neural Networks for Slot Filling in Spoken Language Understanding , 2016, INTERSPEECH.

[7]  Samy Bengio,et al.  Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Bowen Zhou,et al.  Neural Models for Sequence Chunking , 2017, AAAI.

[9]  Gökhan Tür,et al.  Multi-Domain Joint Semantic Frame Parsing Using Bi-Directional RNN-LSTM , 2016, INTERSPEECH.

[10]  Geoffrey Zweig,et al.  Spoken language understanding using long short-term memory neural networks , 2014, 2014 IEEE Spoken Language Technology Workshop (SLT).

[11]  P. J. Price,et al.  Evaluation of Spoken Language Systems: the ATIS Domain , 1990, HLT.

[12]  Wei Xu,et al.  Bidirectional LSTM-CRF Models for Sequence Tagging , 2015, ArXiv.

[13]  Ruhi Sarikaya,et al.  Convolutional neural network based triangular CRF for joint intent detection and slot filling , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.

[14]  Kai Yu,et al.  Encoder-decoder with focus-mechanism for sequence labelling based spoken language understanding , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[15]  Yuji Matsumoto,et al.  Chunking with Support Vector Machines , 2001, NAACL.

[16]  Giuseppe Riccardi,et al.  Generative and discriminative algorithms for spoken language understanding , 2007, INTERSPEECH.

[17]  Bowen Zhou,et al.  Leveraging Sentence-level Information with Encoder LSTM for Semantic Slot Filling , 2016, EMNLP.

[18]  Kaisheng Yao,et al.  Recurrent Neural Networks with External Memory for Language Understanding , 2015, ArXiv.

[19]  Steve Young,et al.  A data-driven spoken language understanding system , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[20]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[21]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[22]  PROCEssIng magazInE IEEE Signal Processing Magazine , 2004 .

[23]  Houfeng Wang,et al.  A Joint Model of Intent Determination and Slot Filling for Spoken Language Understanding , 2016, IJCAI.

[24]  Geoffrey Zweig,et al.  Recurrent neural networks for language understanding , 2013, INTERSPEECH.

[25]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Iryna Gurevych,et al.  Optimal Hyperparameters for Deep LSTM-Networks for Sequence Labeling Tasks , 2017, ArXiv.

[27]  Ruslan Salakhutdinov,et al.  Transfer Learning for Sequence Tagging with Hierarchical Recurrent Networks , 2016, ICLR.