Geno: A Developer Tool for Authoring Multimodal Interaction on Existing Web Applications

Supporting voice commands in applications presents significant benefits to users. However, adding such support to existing GUI-based web apps is effort-consuming with a high learning barrier, as shown in our formative study, due to the lack of unified support for creating multi-modal interfaces. We develop Geno---a developer tool for adding the voice input modality to existing web apps without requiring significate NLP expertise. Geno provides a unified workflow for developers to specify functionalities to support by voice (intents), create language models for detecting intents and the relevant information (parameters) from user utterances, and fulfill the intents by either programmatically invoking the corresponding functions or replaying GUI actions on the web app. Geno further supports references to GUI context in voice commands (e.g., "add this to the playlist"). In a study, developers with little NLP expertise were able to add the multi-modal support for two existing web apps using Geno.

[1]  Philip R. Cohen,et al.  QuickSet: multimodal interaction for distributed applications , 1997, MULTIMEDIA '97.

[2]  Saul Greenberg,et al.  Multimodal multiplayer tabletop gaming , 2007, CIE.

[3]  Tom M. Mitchell,et al.  PUMICE: A Multi-Modal Agent that Learns Concepts and Conditionals from Natural Language and Demonstrations , 2019, UIST.

[4]  Thomas D. LaToza,et al.  Programmers Are Users Too: Human-Centered Methods for Improving Programming Tools , 2016, Computer.

[5]  Karrie Karahalios,et al.  DataTone: Managing Ambiguity in Natural Language Interfaces for Data Visualization , 2015, UIST.

[6]  Marie-Luce Bourguet,et al.  A Toolkit for Creating and Testing Multimodal Interface Designs , 2002 .

[7]  Sharon L. Oviatt,et al.  Mutual disambiguation of recognition errors in a multimodel architecture , 1999, CHI '99.

[8]  Oriana Riva,et al.  Kite: Building Conversational Bots from Mobile Apps , 2018, MobiSys.

[9]  Sharon L. Oviatt,et al.  Ten myths of multimodal interaction , 1999, Commun. ACM.

[10]  V. Braun,et al.  Using thematic analysis in psychology , 2006 .

[11]  Michael Johnston,et al.  mTalk - A Multimodal Browser for Mobile Services , 2011, INTERSPEECH.

[12]  Sungjin Lee,et al.  The Cohort and Speechify Libraries for Rapid Construction of Speech Enabled Applications for Android , 2015, SIGDIAL Conference.

[13]  Richard A. Bolt,et al.  “Put-that-there”: Voice and gesture at the graphics interface , 1980, SIGGRAPH '80.

[14]  Mark S. Ackerman,et al.  Augmenting a window system with speech input , 1990, Computer.

[15]  Marian Petre,et al.  Usability Analysis of Visual Programming Environments: A 'Cognitive Dimensions' Framework , 1996, J. Vis. Lang. Comput..

[16]  Jason Weston,et al.  StarSpace: Embed All The Things! , 2017, AAAI.

[17]  Benjamin Michotte,et al.  A transformational approach for multimodal web user interfaces based on UsiXML , 2005, ICMI '05.

[18]  Z. Obrenovic,et al.  Modeling multimodal human-computer interaction , 2004, Computer.

[19]  Timothy Brittain-Catlin Put it there , 2013 .

[20]  Marie-Luce Bourguet,et al.  Designing and Prototyping Multimodal Commands , 2003, INTERACT.

[21]  David K. McGookin,et al.  User evaluation of OIDE: a rapid prototyping platform for multimodal interaction , 2009, EICS '09.

[22]  Tom M. Mitchell,et al.  APPINITE: A Multi-Modal Interface for Specifying Data Descriptions in Programming by Demonstration Using Natural Language Instructions , 2018, 2018 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC).

[23]  Fabio Paternò,et al.  Authoring pervasive multimodal user interfaces , 2008, Int. J. Web Eng. Technol..

[24]  Bastian Pfleging,et al.  SpeeT: A Multimodal Interaction Style Combining Speech and Touch Interaction in Automotive Environments , 2011 .

[25]  Sharon L. Oviatt,et al.  Designing the User Interface for Multimodal Speech and Pen-Based Gesture Applications: State-of-the-Art Systems and Future Research Directions , 2000, Hum. Comput. Interact..

[26]  Jean-Yves Lionel Lawson,et al.  The openinterface framework: a tool for multimodal interaction. , 2008, CHI Extended Abstracts.

[27]  Amos Azaria,et al.  SUGILITE: Creating Multimodal Smartphone Automation by Demonstration , 2017, CHI.

[28]  Vidya Setlur,et al.  Eviza: A Natural Language Interface for Visual Analysis , 2016, UIST.

[29]  Gierad Laput,et al.  Minuet: Multimodal Interaction with an Internet of Things , 2019, SUI.

[30]  A BoltRichard,et al.  Put-that-there , 1980 .

[31]  Tsuneo Nitta,et al.  XISL: a language for describing multimodal interaction scenarios , 2003, ICMI '03.

[32]  Brad A. Myers,et al.  Making End User Development More Natural , 2017, New Perspectives in End-User Development.

[33]  Denis Lalanne,et al.  HephaisTK: a toolkit for rapid prototyping of multimodal interfaces , 2009, ICMI-MLMI '09.

[34]  Masahiro Araki,et al.  Multimodal Dialog Description Language for Rapid System Development , 2006, SIGDIAL Workshop.

[35]  James Fogarty,et al.  Genie: Input Retargeting on the Web through Command Reverse Engineering , 2017, CHI.

[36]  Gierad Laput,et al.  PixelTone: a multimodal interface for image editing , 2013, CHI.