Hierarchical Multi-Task Natural Language Understanding for Cross-domain Conversational AI: HERMIT NLU

We present a new neural architecture for wide-coverage Natural Language Understanding in Spoken Dialogue Systems. We develop a hierarchical multi-task architecture, which delivers a multi-layer representation of sentence meaning (i.e., Dialogue Acts and Frame-like structures). The architecture is a hierarchy of self-attention mechanisms and BiLSTM encoders followed by CRF tagging layers. We describe a variety of experiments, showing that our approach obtains promising results on a dataset annotated with Dialogue Acts and Frame Semantics. Moreover, we demonstrate its applicability to a different, publicly available NLU dataset annotated with domain-specific intents and corresponding semantic roles, providing overall performance higher than state-of-the-art tools such as RASA, Dialogflow, LUIS, and Watson. For example, we show an average 4.45% improvement in entity tagging F-score over Rasa, Dialogflow and LUIS.

[1]  Roberto Basili,et al.  A Discriminative Approach to Grounded Spoken Language Understanding in Interactive Robotics , 2016, IJCAI.

[2]  Dilek Z. Hakkani-Tür,et al.  Multi-task Learning for Joint Language Understanding and Dialogue State Tracking , 2018, SIGDIAL Conference.

[3]  John B. Lowe,et al.  The Berkeley FrameNet Project , 1998, ACL.

[4]  Oliver Lemon,et al.  Data-Driven Methods for Adaptive Spoken Dialogue Systems , 2012, Springer New York.

[5]  Feifei Li,et al.  OpenTag: Open Attribute Value Extraction from Product Profiles , 2018, KDD.

[6]  P. J. Price,et al.  Evaluation of Spoken Language Systems: the ATIS Domain , 1990, HLT.

[7]  Stefanie Tellex,et al.  Toward understanding natural language directions , 2010, HRI 2010.

[8]  Houfeng Wang,et al.  A Joint Model of Intent Determination and Slot Filling for Spoken Language Understanding , 2016, IJCAI.

[9]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[10]  Liang Li,et al.  A Self-Attentive Model with Gate Mechanism for Spoken Language Understanding , 2018, EMNLP.

[11]  Kuniyuki Takahashi,et al.  Interactively Picking Real-World Objects with Unconstrained Spoken Language Instructions , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[12]  Stefan Wermter,et al.  Towards Dialogue-based Navigation with Multivariate Adaptation driven by Intention and Politeness for Social Robots , 2018, ICSR.

[13]  Jason Weston,et al.  End-To-End Memory Networks , 2015, NIPS.

[14]  Andrew McCallum,et al.  Linguistically-Informed Self-Attention for Semantic Role Labeling , 2018, EMNLP.

[15]  Gökhan Tür,et al.  Multi-Domain Joint Semantic Frame Parsing Using Bi-Directional RNN-LSTM , 2016, INTERSPEECH.

[16]  Gökhan Tür,et al.  End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Language Understanding , 2016, INTERSPEECH.

[17]  C. Fillmore FRAME SEMANTICS AND THE NATURE OF LANGUAGE * , 1976 .

[18]  Oliver Lemon,et al.  Multi-Task Learning for Domain-General Spoken Disfluency Detection in Dialogue Systems , 2018, ArXiv.

[19]  Jiliang Tang,et al.  A Survey on Dialogue Systems: Recent Advances and New Frontiers , 2017, SKDD.

[20]  Kuldip K. Paliwal,et al.  Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[21]  Henrik I. Christensen,et al.  Situated Dialogue and Spatial Organization: What, Where… and Why? , 2007 .

[22]  Bing Liu,et al.  Attention-Based Recurrent Neural Network Models for Joint Intent Detection and Slot Filling , 2016, INTERSPEECH.

[23]  Geoffrey Zweig,et al.  Joint semantic utterance classification and slot filling with recursive neural networks , 2014, 2014 IEEE Spoken Language Technology Workshop (SLT).

[24]  Erik F. Tjong Kim Sang,et al.  Representing Text Chunks , 1999, EACL.

[25]  Verena Rieser,et al.  Benchmarking Natural Language Understanding Services for building Conversational Agents , 2019, IWSDS.

[26]  Yoshimasa Tsuruoka,et al.  A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks , 2016, EMNLP.

[27]  Thomas Wolf,et al.  A Hierarchical Multi-task Approach for Learning Embeddings from Semantic Tasks , 2018, AAAI.

[28]  Gökhan Tür,et al.  Sequential Dialogue Context Modeling for Spoken Language Understanding , 2017, SIGDIAL Conference.

[29]  Changsong Liu,et al.  Collaborative Language Grounding Toward Situated Human-Robot Dialogue , 2017, AI Mag..