Deep Dialog Act Recognition using Multiple Token, Segment, and Context Information Representations

A dialog act is a representation of an intention transmitted in the form of words. In this sense, when someone wants to transmit some intention, it is revealed both in the selected words and in how they are combined to form a structured segment. Furthermore, the intentions of a speaker depend not only on her intrinsic motivation, but also on the history of the dialog and the expectation she has of its future. In this article we explore multiple representation approaches to capture cues for intention at different levels. Recent approaches on automatic dialog act recognition use Word2Vec embeddings for word representation. However, these are not able to capture segment structure information nor morphological traits related to intention. Thus, we also explore the use of dependency-based word embeddings, as well as character-level tokenization. To generate the segment representation, the top performing approaches on the task use either RNNs that are able to capture information concerning the sequentiality of the tokens or CNNs that are able to capture token patterns that reveal function. However, both aspects are important and should be captured together. Thus, we also explore the use of an RCNN. Finally, context information concerning turn-taking, as well as that provided by the surrounding segments has been proved important in previous studies. However, the representation approaches used for the latter in those studies are not appropriate to capture sequentiality, which is one of the most important characteristics of the segments in a dialog. Thus, we explore the use of approaches able to capture that information. By combining the best approaches for each aspect, we achieve results that surpass the previous state-of-the-art in a dialog system context and similar to human-level in an annotation context on the Switchboard Dialog Act Corpus, which is the most explored corpus for the task.

[1]  Quan Hung Tran,et al.  A Hierarchical Neural Model for Learning Sequences of Dialogue Acts , 2017, EACL.

[2]  François Chollet,et al.  Keras: The Python Deep Learning library , 2018 .

[3]  A. Koller,et al.  Speech Acts: An Essay in the Philosophy of Language , 1969 .

[4]  Wolfgang Minker,et al.  A Parameterized and Annotated Spoken Dialog Corpus of the CMU Let’s Go Bus Information System , 2012, LREC.

[5]  Ricardo Ribeiro,et al.  A Multilingual and Multidomain Study on Dialog Act Recognition Using Character-Level Tokenization , 2019, Inf..

[6]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[7]  Daniel Marcu,et al.  The rhetorical parsing of unrestricted texts: a surface-based approach , 2000, CL.

[8]  Jonathan Weese,et al.  UMBC_EBIQUITY-CORE: Semantic Textual Similarity Systems , 2013, *SEMEVAL.

[9]  José Camacho-Collados,et al.  From Word to Sense Embeddings: A Survey on Vector Representations of Meaning , 2018, J. Artif. Intell. Res..

[10]  Franck Dernoncourt,et al.  Sequential Short-Text Classification with Recurrent and Convolutional Neural Networks , 2016, NAACL.

[11]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[12]  Elizabeth Shriberg,et al.  The ICSI Meeting Recorder Dialog Act (MRDA) Corpus , 2004, SIGDIAL Workshop.

[13]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[14]  Gholamreza Haffari,et al.  A Latent Variable Recurrent Neural Network for Discourse Relation Language Models , 2016, ArXiv.

[15]  Michael Ferguson,et al.  Automatic Extraction of Cue Phrases for Cross-Corpus Dialogue Act Classification , 2010, COLING.

[16]  Slav Petrov,et al.  A Universal Part-of-Speech Tagset , 2011, LREC.

[17]  Gholamreza Haffari,et al.  A Latent Variable Recurrent Neural Network for Discourse Relation Language Models , 2016 .

[18]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[19]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[20]  M. Rotaru Dialog Systems ” class , Spring 2002-TERM PROJECT-Dialog Act Tagging using Memory-Based Learning , 2007 .

[21]  Anne H. Anderson,et al.  The Hcrc Map Task Corpus , 1991 .

[22]  Mari Ostendorf,et al.  A Dynamic Speaker Model for Conversational Interactions , 2019, NAACL.

[23]  Elizabeth Shriberg,et al.  Switchboard SWBD-DAMSL shallow-discourse-function annotation coders manual , 1997 .

[24]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[25]  Shrikanth S. Narayanan,et al.  Combining lexical, syntactic and prosodic cues for improved online dialog act tagging , 2009, Comput. Speech Lang..

[26]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[27]  Cícero Nogueira dos Santos,et al.  Learning Character-level Representations for Part-of-Speech Tagging , 2014, ICML.

[28]  Jürgen Schmidhuber,et al.  Dynamische neuronale Netze und das fundamentale raumzeitliche Lernproblem , 1990 .

[29]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[30]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[31]  Christopher D. Manning Computational Linguistics and Deep Learning , 2015, Computational Linguistics.

[32]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[33]  Eduardo Lleida,et al.  Design and acquisition of a telephone spontaneous speech dialogue corpus in Spanish: DIHANA , 2006, LREC.

[34]  Jörg Tiedemann,et al.  Parallel Data, Tools and Interfaces in OPUS , 2012, LREC.

[35]  Yann LeCun,et al.  Very Deep Convolutional Networks for Text Classification , 2016, EACL.

[36]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[37]  L. Baum,et al.  Statistical Inference for Probabilistic Functions of Finite State Markov Chains , 1966 .

[38]  Andreas Stolcke,et al.  The ICSI Meeting Corpus , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[39]  Pavel Král,et al.  Dialogue Act Recognition Approaches , 2010, Comput. Informatics.

[40]  Alec Radford,et al.  Improving Language Understanding by Generative Pre-Training , 2018 .

[41]  Yoav Goldberg,et al.  A Primer on Neural Network Models for Natural Language Processing , 2015, J. Artif. Intell. Res..

[42]  Rodney D. Nielsen,et al.  Dialogue Act Classification in Domain-Independent Conversations Using a Deep Recurrent Neural Network , 2016, COLING.

[43]  Tomas Mikolov,et al.  Advances in Pre-Training Distributed Word Representations , 2017, LREC.

[44]  Omer Levy,et al.  Dependency-Based Word Embeddings , 2014, ACL.

[45]  Luke S. Zettlemoyer,et al.  AllenNLP: A Deep Semantic Natural Language Processing Platform , 2018, ArXiv.

[46]  Jun Zhao,et al.  Recurrent Convolutional Neural Networks for Text Classification , 2015, AAAI.

[47]  Norbert Reithinger,et al.  Dia logue Acts in VERBMOBIL-2 Second Edition , 1997 .

[48]  Jean Carletta,et al.  Assessing Agreement on Classification Tasks: The Kappa Statistic , 1996, CL.

[49]  John J. Godfrey,et al.  SWITCHBOARD: telephone speech corpus for research and development , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[50]  Deniz Yuret,et al.  CharNER: Character-Level Named Entity Recognition , 2016, COLING.

[51]  Yun Lei,et al.  Using Context Information for Dialog Act Classification in DNN Framework , 2017, EMNLP.

[52]  Elizabeth Shriberg,et al.  Automatic dialog act segmentation and classification in multiparty meetings , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[53]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[54]  Xiang Zhang,et al.  Character-level Convolutional Networks for Text Classification , 2015, NIPS.

[55]  Kunihiko Fukushima,et al.  Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position , 1980, Biological Cybernetics.

[56]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[57]  Phil Blunsom,et al.  Recurrent Convolutional Neural Networks for Discourse Compositionality , 2013, CVSM@ACL.

[58]  Ingrid Zukerman,et al.  Preserving Distributional Information in Dialogue Act Classification , 2017, EMNLP.

[59]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[60]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[61]  Ricardo Ribeiro,et al.  A Study on Dialog Act Recognition using Character-Level Tokenization , 2018, AIMSA.

[62]  Fabrizio Sebastiani,et al.  Distributional term representations: an experimental comparison , 2004, CIKM '04.

[63]  Ingrid Zukerman,et al.  A Generative Attentional Neural Network Model for Dialogue Act Classification , 2017, ACL.

[64]  Andreas Stolcke,et al.  Dialogue act modeling for automatic tagging and recognition of conversational speech , 2000, CL.

[65]  Yoshua Bengio,et al.  A Recurrent Latent Variable Model for Sequential Data , 2015, NIPS.

[66]  Fredrik Olsson,et al.  Active Learning for Dialogue Act Classification , 2011, INTERSPEECH.

[67]  Rafael E. Banchs,et al.  The Fourth Dialog State Tracking Challenge , 2016, IWSDS.

[68]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..