Speaker Role Contextual Modeling for Language Understanding and Dialogue Policy Learning

Language understanding (LU) and dialogue policy learning are two essential components in conversational systems. Human-human dialogues are not well-controlled and often random and unpredictable due to their own goals and speaking habits. This paper proposes a role-based contextual model to consider different speaker roles independently based on the various speaking patterns in the multi-turn dialogues. The experiments on the benchmark dataset show that the proposed role-based model successfully learns role-specific behavioral patterns for contextual encoding and then significantly improves language understanding and dialogue policy learning tasks.

[1]  Ta-Chung Chi,et al.  Dynamic time-aware attention to speaker roles and contexts for spoken language understanding , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).

[2]  Kuldip K. Paliwal,et al.  Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[3]  Rafael E. Banchs,et al.  The Fourth Dialog State Tracking Challenge , 2016, IWSDS.

[4]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[5]  Yangyang Shi,et al.  Contextual spoken language understanding using recurrent neural networks , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[6]  Alexander I. Rudnicky,et al.  Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding , 2015, ICMI.

[7]  Jason Weston,et al.  Memory Networks , 2014, ICLR.

[8]  Ruhi Sarikaya,et al.  Contextual domain classification in spoken language understanding systems using recurrent neural network , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[9]  Dilek Z. Hakkani-Tür,et al.  Easy contextual intent prediction and slot detection , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[10]  Jianfeng Gao,et al.  End-to-End Task-Completion Neural Dialogue Systems , 2017, IJCNLP.

[11]  Jason Weston,et al.  Learning End-to-End Goal-Oriented Dialog , 2016, ICLR.

[12]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[13]  Gökhan Tür,et al.  Knowledge as a Teacher: Knowledge-Guided Structural Attention Networks , 2016, ArXiv.

[14]  Gokhan Tur,et al.  Spoken Language Understanding: Systems for Extracting Semantic Information from Speech , 2011 .

[15]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[16]  Gökhan Tür,et al.  Multi-Domain Joint Semantic Frame Parsing Using Bi-Directional RNN-LSTM , 2016, INTERSPEECH.

[17]  Gökhan Tür,et al.  End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Language Understanding , 2016, INTERSPEECH.

[18]  Qing Ling,et al.  Learning a deep l ∞ encoder for hashing , 2016, IJCAI 2016.

[19]  Dilek Z. Hakkani-Tür,et al.  End-to-end joint learning of natural language understanding and dialogue manager , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[20]  David Vandyke,et al.  A Network-based End-to-End Trainable Task-oriented Dialogue System , 2016, EACL.

[21]  Gökhan Tür,et al.  Syntax or semantics? knowledge-guided joint semantic frame parsing , 2016, 2016 IEEE Spoken Language Technology Workshop (SLT).

[22]  Alexander I. Rudnicky,et al.  An Intelligent Assistant for High-Level Task Understanding , 2016, IUI.