General-Purpose Communicative Function Recognition using a Hierarchical Network with Cascading Outputs and Maximum a Posteriori Path Estimation

ISO 24617-2, the standard for dialog act annotation, defines a hierarchically organized set of general-purpose communicative functions. The automatic recognition of these functions, although practically unexplored, is relevant for a dialog system, since they provide cues regarding the intention behind the segments and how they should be interpreted. In this paper, we explore the recognition of general-purpose communicative functions in the DialogBank, which is a reference set of dialogs annotated according to the standard. To do so, we adapt a state-of-the-art approach on flat dialog act recognition to deal with the hierarchical classification problem. More specifically, we propose the use of a hierarchical network with cascading outputs and maximum a posteriori path estimation to predict the communicative function at each level of the hierarchy, preserve the dependencies between the functions in the path, and decide at which level to stop. Furthermore, since the amount of dialogs in the DialogBank is reduced, we rely both on additional dialogs annotated using mapping processes and on transfer learning to improve performance. The results of our experiments show that the hierarchical approach outperforms a flat one and that maximum a posteriori estimation outperforms an iterative prediction approach based on masking.

[1]  Gholamreza Haffari,et al.  A Latent Variable Recurrent Neural Network for Discourse Relation Language Models , 2016, ArXiv.

[2]  Stan Matwin,et al.  Functional Annotation of Genes Using Hierarchical Text Categorization , 2005 .

[3]  Ivana Kruijff-Korbayová,et al.  Dialogue Act Classification in Team Communication for Robot Assisted Disaster Response , 2019, SIGdial.

[4]  Pavel Král,et al.  Dialogue Act Recognition Approaches , 2010, Comput. Informatics.

[5]  Quan Hung Tran,et al.  A Hierarchical Neural Model for Learning Sequences of Dialogue Acts , 2017, EACL.

[6]  Yun Lei,et al.  Using Context Information for Dialog Act Classification in DNN Framework , 2017, EMNLP.

[7]  Ricardo Ribeiro,et al.  A Multilingual and Multidomain Study on Dialog Act Recognition Using Character-Level Tokenization , 2019, Inf..

[8]  François Chollet,et al.  Keras: The Python Deep Learning library , 2018 .

[9]  A. Koller,et al.  Speech Acts: An Essay in the Philosophy of Language , 1969 .

[10]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[11]  Rodney D. Nielsen,et al.  Dialogue Act Classification in Domain-Independent Conversations Using a Deep Recurrent Neural Network , 2016, COLING.

[12]  Ricardo Ribeiro,et al.  Mapping the Dialog Act Annotations of the LEGO Corpus into the Communicative Functions of ISO 24617-2 , 2016, ArXiv.

[13]  Fredrik Olsson,et al.  Active Learning for Dialogue Act Classification , 2011, INTERSPEECH.

[14]  Tomás Svoboda,et al.  TRADR Project: Long-Term Human-Robot Teaming for Robot Assisted Disaster Response , 2015, KI - Künstliche Intelligenz.

[15]  Ricardo Ribeiro,et al.  Deep Dialog Act Recognition using Multiple Token, Segment, and Context Information Representations , 2018, J. Artif. Intell. Res..

[16]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[17]  Kôiti Hasida,et al.  ISO 24617-2: A semantically-based standard for dialogue annotation , 2012, LREC.

[18]  M K Tanenhaus,et al.  Functional clauses and sentence segmentation. , 1978, Journal of speech and hearing research.

[19]  Wolfgang Minker,et al.  A Parameterized and Annotated Spoken Dialog Corpus of the CMU Let’s Go Bus Information System , 2012, LREC.

[20]  Harry Bunt,et al.  The DialogBank: dialogues with interoperable annotations , 2018, Lang. Resour. Evaluation.

[21]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[22]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[23]  Petr Motlícek,et al.  The DBOX Corpus Collection of Spoken Human-Human and Human-Machine Dialogues , 2014, LREC.

[24]  Lenhart K. Schubert,et al.  The TRAINS Project , 1991 .

[25]  Franck Dernoncourt,et al.  Sequential Short-Text Classification with Recurrent and Convolutional Neural Networks , 2016, NAACL.

[26]  Phil Blunsom,et al.  Recurrent Convolutional Neural Networks for Discourse Compositionality , 2013, CVSM@ACL.

[27]  Maxine Eskénazi,et al.  Doing research on a deployed spoken dialogue system: one year of let's go! experience , 2006, INTERSPEECH.

[28]  John J. Godfrey,et al.  SWITCHBOARD: telephone speech corpus for research and development , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[29]  Elizabeth Shriberg,et al.  Switchboard SWBD-DAMSL shallow-discourse-function annotation coders manual , 1997 .

[30]  S. Crawford,et al.  Volume 1 , 2012, Journal of Diabetes Investigation.

[31]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[32]  Harry Bunt,et al.  Dialogue Act Annotation with the ISO 24617-2 Standard , 2017 .