Augmenting Small Data to Classify Contextualized Dialogue Acts for Exploratory Visualization

Our goal is to develop an intelligent assistant to support users explore data via visualizations. We have collected a new corpus of conversations, CHICAGO-CRIME-VIS, geared towards supporting data visualization exploration, and we have annotated it for a variety of features, including contextualized dialogue acts. In this paper, we describe our strategies and their evaluation for dialogue act classification. We highlight how thinking aloud affects interpretation of dialogue acts in our setting and how to best capture that information. A key component of our strategy is data augmentation as applied to the training data, since our corpus is inherently small. We ran experiments with the Balanced Bagging Classifier (BAGC), Condiontal Random Field (CRF), and several Long Short Term Memory (LSTM) networks, and found that all of them improved compared to the baseline (e.g., without the data augmentation pipeline). CRF outperformed the other classification algorithms, with the LSTM networks showing modest improvement, even after obtaining a performance boost from domain-trained word embeddings. This result is of note because training a CRF is far less resource-intensive than training deep learning models, hence given a similar if not better performance, traditional methods may still be preferable in order to lower resource consumption.

[1]  Vidya Setlur,et al.  Applying Pragmatics Principles for Interaction with Visual Analytics , 2018, IEEE Transactions on Visualization and Computer Graphics.

[2]  Vidya Setlur,et al.  Eviza: A Natural Language Interface for Visual Analysis , 2016, UIST.

[3]  Gokhan Tur,et al.  Spoken Language Understanding: Systems for Extracting Semantic Information from Speech , 2011 .

[4]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[5]  Elizabeth Shriberg,et al.  Automatic dialog act segmentation and classification in multiparty meetings , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[6]  Kevin Gimpel,et al.  Learning Paraphrastic Sentence Embeddings from Back-Translated Bitext , 2017, EMNLP.

[7]  Oren Etzioni,et al.  Paraphrase-Driven Learning for Open Question Answering , 2013, ACL.

[8]  Jillian Aurisano “ Show Me Data . ” Observational Study of a Conversational Interface in Visual Data Exploration , 2015 .

[9]  Alex Acero,et al.  Semantic Frame‐Based Spoken Language Understanding , 2011 .

[10]  Geoffrey Zweig,et al.  Using Recurrent Neural Networks for Slot Filling in Spoken Language Understanding , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[11]  Ali Ahmadvand,et al.  Contextual Dialogue Act Classification for Open-Domain Conversational Agents , 2019, SIGIR.

[12]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[13]  Monica S. Lam,et al.  Genie: a generator of natural language semantic parsers for virtual assistant commands , 2019, PLDI.

[14]  Elizabeth Shriberg,et al.  The ICSI Meeting Recorder Dialog Act (MRDA) Corpus , 2004, SIGDIAL Workshop.

[15]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[16]  Ngoc Thang Vu Sequential Convolutional Neural Networks for Slot Filling in Spoken Language Understanding , 2016, INTERSPEECH.

[17]  Barbara Di Eugenio,et al.  Multimodal Coreference Resolution for Exploratory Data Visualization Dialogue: Context-Based Annotation and Gesture Identification , 2017 .

[18]  Barbara Di Eugenio,et al.  The roles and recognition of Haptic-Ostensive actions in collaborative multimodal human-human dialogues , 2015, Comput. Speech Lang..

[19]  Alex Lascarides,et al.  Indirect Speech Acts , 2001, Synthese.

[20]  Michael E. Papka,et al.  Evaluating user behavior and strategy during visual exploration , 2014, BELIV.

[21]  Yiwen Sun,et al.  Articulate: A Semi-automated Model for Translating Natural Language Queries into Meaningful Visualizations , 2010, Smart Graphics.

[22]  Jonathan Berant,et al.  Semantic Parsing via Paraphrasing , 2014, ACL.

[23]  Abhinav Kumar,et al.  Towards a dialogue system that supports rich visualizations of data , 2016, SIGDIAL Conference.

[24]  Andrew McCallum,et al.  Energy and Policy Considerations for Deep Learning in NLP , 2019, ACL.

[25]  Ioannis Konstas,et al.  Corpus of Multimodal Interaction for Collaborative Planning , 2019 .

[26]  Anton Nijholt,et al.  Addressee Identification in Face-to-Face Meetings , 2006, EACL.

[27]  Andrew Johnson,et al.  Intelligent Assistant for Exploring Data Visualizations , 2020, FLAIRS Conference.

[28]  Rodney D. Nielsen,et al.  Dialogue Act Classification in Domain-Independent Conversations Using a Deep Recurrent Neural Network , 2016, COLING.

[29]  Tamara Munzner,et al.  A Multi-Level Typology of Abstract Visualization Tasks , 2013, IEEE Transactions on Visualization and Computer Graphics.

[30]  Karrie Karahalios,et al.  DataTone: Managing Ambiguity in Natural Language Interfaces for Data Visualization , 2015, UIST.

[31]  Ashwani Kumar,et al.  Miamm — A Multimodal Dialogue System Using Haptics , 2005 .

[32]  Yijia Liu,et al.  Sequence-to-Sequence Data Augmentation for Dialogue Language Understanding , 2018, COLING.

[33]  Sang-goo Lee,et al.  Data Augmentation for Spoken Language Understanding via Joint Variational Generation , 2018, AAAI.

[34]  E. Schegloff,et al.  Opening up Closings , 1973 .

[35]  Timothy Baldwin,et al.  Classifying Dialogue Acts in One-on-One Live Chats , 2010, EMNLP.

[36]  Andreas Stolcke,et al.  Dialogue act modeling for automatic tagging and recognition of conversational speech , 2000, CL.

[37]  Monica M. C. Schraefel,et al.  Connecting the Dots: A Multi-pivot Approach to Data Exploration , 2011, SEMWEB.

[38]  Andrei Popescu-Belis,et al.  Generating Usable Formats for Metadata and Annotations in a Large Meeting Corpus , 2007, ACL.

[39]  Melanie Tory,et al.  How Information Visualization Novices Construct Visualizations , 2010, IEEE Trans. Vis. Comput. Graph..

[40]  Rebecca E. Grinter,et al.  A Multi-Modal Natural Language Interface to an Information Visualization Environment , 2001, Int. J. Speech Technol..

[41]  Dan Roth,et al.  Learning Question Classifiers , 2002, COLING.

[42]  Shafiq R. Joty,et al.  Dialogue Act Recognition in Synchronous and Asynchronous Conversations , 2013, SIGDIAL Conference.

[43]  Mukund Sundararajan,et al.  Analyza: Exploring Data with Conversation , 2017, IUI.

[44]  Gökhan Tür,et al.  Multi-Domain Joint Semantic Frame Parsing Using Bi-Directional RNN-LSTM , 2016, INTERSPEECH.

[45]  Kallirroi Georgila,et al.  Conversational Image Editing: Incremental Intent Identification in a New Dialogue Task , 2018, SIGDIAL Conference.

[46]  Mirella Lapata,et al.  Learning to Paraphrase for Question Answering , 2017, EMNLP.

[47]  M. W. van Someren,et al.  The think aloud method: a practical approach to modelling cognitive processes , 1994 .

[48]  Luciana Benotti,et al.  Clarification Potential of Instructions , 2009, SIGDIAL Conference.

[49]  Percy Liang,et al.  Data Recombination for Neural Semantic Parsing , 2016, ACL.

[50]  Jillian Aurisano Articulate 2 : Toward a Conversational Interface for Visual Data Exploration , 2016 .

[51]  Harshit Kumar,et al.  Dialogue Act Sequence Labeling using Hierarchical encoder with CRF , 2017, AAAI.