Minuet: Multimodal Interaction with an Internet of Things

A large number of Internet-of-Things (IoT) devices will soon populate our physical environments. Yet, IoT devices’ reliance on mobile applications and voice-only assistants as the primary interface limits their scalability and expressiveness. Building off of the classic ‘Put-That-There’ system, we contribute an exploration of the design space of voice + gesture interaction with spatially-distributed IoT devices. Our design space decomposes users’ IoT commands into two components—selection and interaction. We articulate how the permutations of voice and freehand gesture for these two components can complementarily afford interaction possibilities that go beyond current approaches. We instantiate this design space as a proof-of-concept sensing platform and demonstrate a series of novel IoT interaction scenarios, such as making ‘dumb’ objects smart, commanding robotic appliances, and resolving ambiguous pointing at cluttered devices.

[1]  Sang Ho Yoon,et al.  Scenariot: Spatially Mapping Smart Things Within Augmented Reality Scenes , 2018, CHI.

[2]  Saul Greenberg,et al.  Proxemic interaction: designing for a proximity and orientation-aware environment , 2010, ITS '10.

[3]  Xiang 'Anthony' Chen,et al.  Snap-To-It: A User-Inspired Platform for Opportunistic Device Interactions , 2016, CHI.

[4]  Sharon L. Oviatt,et al.  Designing the User Interface for Multimodal Speech and Pen-Based Gesture Applications: State-of-the-Art Systems and Future Research Directions , 2000, Hum. Comput. Interact..

[5]  Joseph A. Paradiso,et al.  WristQue: A personal sensor wristband , 2013, 2013 IEEE International Conference on Body Sensor Networks.

[6]  Yang Zhang,et al.  Wall++: Room-Scale Interactive and Context-Aware Sensing , 2018, CHI.

[7]  Sebastian Boring,et al.  Proxemic-Aware Controls: Designing Remote Controls for Ubiquitous Computing Ecologies , 2015, MobileHCI.

[8]  Hans-Werner Gellersen,et al.  AmbiGaze: Direct Control of Ambient Devices by Gaze , 2016, Conference on Designing Interactive Systems.

[9]  Richard A. Bolt,et al.  “Put-that-there”: Voice and gesture at the graphics interface , 1980, SIGGRAPH '80.

[10]  Saul Greenberg,et al.  Multimodal multiplayer tabletop gaming , 2007, CIE.

[11]  Karen Holtzblatt,et al.  Contextual design: using customer work models to drive systems design , 1998, CHI Conference Summary.

[12]  Sharon L. Oviatt,et al.  Mutual disambiguation of recognition errors in a multimodel architecture , 1999, CHI '99.

[13]  Jun Rekimoto,et al.  iCam: Precise at-a-Distance Interaction in the Physical Environment , 2006, Pervasive.

[14]  Michael Beigl Point & Click - Interaction in Smart Environments , 1999, HUC.

[15]  Luca Maria Gambardella,et al.  Wearable multi-modal interface for human multi-robot interaction , 2016, 2016 IEEE International Symposium on Safety, Security, and Rescue Robotics (SSRR).

[16]  Gierad Laput,et al.  ViBand: High-Fidelity Bio-Acoustic Sensing Using Commodity Smartwatch Accelerometers , 2016, UIST.

[17]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[18]  Ferran Argelaguet,et al.  A survey of 3D object selection techniques for virtual environments , 2013, Comput. Graph..

[19]  Karen Holtzblatt,et al.  Contextual design , 1997, INTR.

[20]  Fernando Seco Granja,et al.  Comparing Decawave and Bespoon UWB location systems: Indoor/outdoor performance analysis , 2016, 2016 International Conference on Indoor Positioning and Indoor Navigation (IPIN).

[21]  Roy Want,et al.  Gesture connect: facilitating tangible interaction with a flick of the wrist , 2007, TEI.

[22]  Gierad Laput,et al.  Deus EM Machina: On-Touch Contextual Functionality for Smart IoT Appliances , 2017, CHI.

[23]  Sebastian Boring,et al.  Gradual engagement: facilitating information exchange between digital devices as a function of proximity , 2012, ITS.

[24]  Antonella De Angeli,et al.  Integration and synchronization of input modes during multimodal human-computer interaction , 1997, CHI.

[25]  Philip R. Cohen,et al.  QuickSet: multimodal interaction for distributed applications , 1997, MULTIMEDIA '97.

[26]  Wendy Ju,et al.  Range: exploring implicit interaction through electronic whiteboard design , 2008, CSCW.

[27]  Mani B. Srivastava,et al.  SeleCon: Scalable IoT Device Selection and Control Using Hand Gestures , 2017, 2017 IEEE/ACM Second International Conference on Internet-of-Things Design and Implementation (IoTDI).

[28]  Luca Maria Gambardella,et al.  Robot Identification and Localization with Pointing Gestures , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[29]  Xin Jin,et al.  SnapLink: Fast and Accurate Vision-Based Appliance Control in Large Commercial Buildings , 2018, Proc. ACM Interact. Mob. Wearable Ubiquitous Technol..

[30]  Joëlle Coutaz,et al.  A design space for multimodal systems: concurrent processing and data fusion , 1993, INTERCHI.

[31]  Katashi Nagao,et al.  The world through the computer: computer augmented interaction with real world environments , 1995, UIST '95.

[32]  Bill N. Schilit,et al.  Context-aware computing applications , 1994, Workshop on Mobile Computing Systems and Applications.

[33]  Edward A. Lee,et al.  HOBS: head orientation-based selection in physical spaces , 2014, SUI.

[34]  Sharon L. Oviatt,et al.  Ten myths of multimodal interaction , 1999, Commun. ACM.

[35]  Jun Rekimoto,et al.  InfoPoint: A Device that Provides a Uniform User Interface to Allow Appliances to Work Together over a Network , 2001, Personal and Ubiquitous Computing.

[36]  Pattie Maes,et al.  Smarter objects: using AR technology to program physical objects and their interactions , 2013, CHI Extended Abstracts.