Iris: A Conversational Agent for Complex Tasks

Today, most conversational agents are limited to simple tasks supported by standalone commands, such as getting directions or scheduling an appointment. To support more complex tasks, agents must be able to generalize from and combine the commands they already understand. This paper presents a new approach to designing conversational agents inspired by linguistic theory, where agents can execute complex requests interactively by combining commands through nested conversations. We demonstrate this approach in Iris, an agent that can perform open-ended data science tasks such as lexical analysis and predictive modeling. To power Iris, we have created a domain-specific language that transforms Python functions into combinable automata and regulates their combinations through a type system. Running a user study to examine the strengths and limitations of our approach, we find that data scientists completed a modeling task 2.6 times faster with Iris than with Jupyter Notebook.

[1]  E. Anderson The Species Problem in Iris , 1936 .

[2]  Joseph Weizenbaum,et al.  ELIZA—a computer program for the study of natural language communication between man and machine , 1966, CACM.

[3]  A. Koller,et al.  Speech Acts: An Essay in the Philosophy of Language , 1969 .

[4]  Tanya Reinhart,et al.  The syntactic domain of anaphora , 1976 .

[5]  Terry Winograd,et al.  Understanding computers and cognition - a new foundation for design , 1987 .

[6]  Terry Winograd,et al.  A language/action perspective on the design of cooperative work , 1986, CSCW '86.

[7]  David Harel,et al.  Statecharts: A Visual Formalism for Complex Systems , 1987, Sci. Comput. Program..

[8]  Pattie Maes,et al.  Agents that reduce work and information overload , 1994, CACM.

[9]  Mirella Lapata,et al.  Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics , 1999, ACL 1999.

[10]  James Paul Gee,et al.  话语分析入门 : 理论与方法 = An introduction to discourse analysis : theory and method , 1999 .

[11]  Jennifer C. Lai,et al.  Conversational interfaces , 2000, CACM.

[12]  Anoop K. Sinha,et al.  Suede: a Wizard of Oz prototyping tool for speech user interfaces , 2000, UIST '00.

[13]  Alexander H. Waibel,et al.  Multimodal error correction for speech user interfaces , 2001, TCHI.

[14]  Clifford Nass,et al.  Wired for Speech: How Voice Activates and Advances the Human-Computer Relationship , 2005 .

[15]  Khalil Sima'an,et al.  Wired for Speech: How Voice Activates and Advances the Human-Computer Relationship , 2006, Computational Linguistics.

[16]  Nathanael Chambers,et al.  PLOW: A Collaborative Task Learning Agent , 2007, AAAI.

[17]  Rob Miller,et al.  Keyword programming in Java , 2008, Automated Software Engineering.

[18]  Desney S. Tan,et al.  EnsembleMatrix: interactive visualization to support machine learning with multiple classifiers , 2009, CHI.

[19]  Alexander I. Rudnicky,et al.  The RavenClaw dialog management framework: Architecture and systems , 2009, Comput. Speech Lang..

[20]  James A. Landay,et al.  Gestalt: integrated support for implementation and analysis in machine learning , 2010, UIST.

[21]  John Maloney,et al.  The Scratch Programming Language and Environment , 2010, TOCE.

[22]  Scott R. Klemmer,et al.  What would other programmers do: suggesting solutions to error messages , 2010, CHI.

[23]  Vincent Ng,et al.  Supervised Noun Phrase Coreference Research: The First Fifteen Years , 2010, ACL.

[24]  Jeffrey Heer,et al.  Wrangler: interactive visual specification of data transformation scripts , 2011, CHI.

[25]  Adam Fourney,et al.  Query-feature graphs: bridging user vocabulary and system functionality , 2011, UIST '11.

[26]  Margo I. Seltzer,et al.  BURRITO: Wrapping Your Lab Notebook in Computational Infrastructure , 2012, TaPP.

[27]  Jeffrey Heer,et al.  Enterprise Data Analysis and Visualization: An Interview Study , 2012, IEEE Transactions on Visualization and Computer Graphics.

[28]  Walter S. Lasecki,et al.  Answering visual questions with conversational crowd assistants , 2013, ASSETS.

[29]  Jeffrey Nichols,et al.  Chorus: a crowd-powered conversational assistant , 2013, UIST.

[30]  Gierad Laput,et al.  PixelTone: a multimodal interface for image editing , 2013, CHI.

[31]  Jonathan Ginzburg,et al.  A corpus-based taxonomy of question responses , 2013, IWCS.

[32]  Andrew Chou,et al.  Semantic Parsing on Freebase from Question-Answer Pairs , 2013, EMNLP.

[33]  Gierad Laput,et al.  CommandSpace: modeling the relationships between tasks, descriptions and features , 2014, UIST.

[34]  Michael S. Bernstein,et al.  Emergent, crowd-scale programming practice in the IDE , 2014, CHI.

[35]  Karrie Karahalios,et al.  DataTone: Managing Ambiguity in Natural Language Interfaces for Data Visualization , 2015, UIST.

[36]  Geoffrey Zweig,et al.  Using Recurrent Neural Networks for Slot Filling in Spoken Language Understanding , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[37]  Ronald G. Dreslinski,et al.  Sirius: An Open End-to-End Voice and Vision Personal Assistant and Its Implications for Future Warehouse Scale Computers , 2015, ASPLOS.

[38]  Joelle Pineau,et al.  The Ubuntu Dialogue Corpus: A Large Dataset for Research in Unstructured Multi-Turn Dialogue Systems , 2015, SIGDIAL Conference.

[39]  Christopher D. Manning,et al.  Learning Language Games through Interaction , 2016, ACL.

[40]  Joelle Pineau,et al.  Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models , 2015, AAAI.

[41]  Phillipp Kaestner,et al.  Understanding Computers And Cognition A New Foundation For Design , 2016 .

[42]  Alexander I. Rudnicky,et al.  An Intelligent Assistant for High-Level Task Understanding , 2016, IUI.

[43]  Michael S. Bernstein,et al.  Meta: Enabling Programming Languages to Learn from the Crowd , 2016, UIST.

[44]  Eric Horvitz,et al.  Identifying Dogmatism in Social Media: Signals and Models , 2016, EMNLP.

[45]  Vidya Setlur,et al.  Eviza: A Natural Language Interface for Visual Analysis , 2016, UIST.

[46]  Michael S. Bernstein,et al.  Augur: Mining Human Behaviors from Fiction to Power Interactive Systems , 2016, CHI.

[47]  Xin Rong,et al.  CodeMend: Assisting Interactive Programming with Bimodal Embedding , 2016, UIST.

[48]  Michael S. Bernstein,et al.  Empath: Understanding Topic Signals in Large-Scale Text , 2016, CHI.

[49]  Martin Porcheron Conversational agent use in a café , 2017 .

[50]  Monica S. Lam,et al.  Almond: The Architecture of an Open, Crowdsourced, Privacy-Preserving, Programmable Virtual Assistant , 2017, WWW.

[51]  Jignesh M. Patel,et al.  Ava: From Data to Insights Through Conversations , 2017, CIDR.

[52]  Brad A. Myers,et al.  Variolite: Supporting Exploratory Programming by Data Scientists , 2017, CHI.

[53]  Sarah Sharples,et al.  "Do Animals Have Accents?": Talking with Agents in Multi-Party Conversation , 2017, CSCW.

[54]  Jaime Teevan,et al.  Calendar.help: Designing a Workflow-Based Scheduling Agent with Humans in the Loop , 2017, CHI.

[55]  Amos Azaria,et al.  SUGILITE: Creating Multimodal Smartphone Automation by Demonstration , 2017, CHI.

[56]  Oksana Smal,et al.  POLITICAL DISCOURSE CONTENT ANALYSIS: A CRITICAL OVERVIEW OF A COMPUTERIZED TEXT ANALYSIS PROGRAM LINGUISTIC INQUIRY AND WORD COUNT (LIWC) , 2020, Naukovì zapiski Nacìonalʹnogo unìversitetu «Ostrozʹka akademìâ». Serìâ «Fìlologìâ».