Humans and smart environments: a novel multimodal interaction approach

In this paper, we describe a multimodal approach for human-smart environment interaction. The input interaction is based on three modalities: deictic gestures, symbolic gestures and isolated-words. The deictic gesture is interpreted using the PTAMM (Parallel Tracking and Multiple Mapping) method exploiting a camera handheld or worn on the user arm. The PTAMM algorithm tracks in real-time the position and orientation of the hand in the environment. This information is used to point real or virtual objects, previously added to the environment, using the optical camera axis. Symbolic hand-gestures and isolated voice commands are recognized and used to interact with the pointed target. Haptic and acoustic feedbacks are provided to the user in order to improve the quality of the interaction. A complete prototype has been realized and a first usability evaluation, assessed with the help of 10 users has shown positive results.

[1]  Elena Mugellini,et al.  ARAMIS: Toward a Hybrid Approach for Human- Environment Interaction , 2011, HCI.

[2]  Gerd Kortuem,et al.  "Where are you pointing at?" A study of remote collaboration in a wearable videoconference system , 1999, Digest of Papers. Third International Symposium on Wearable Computers.

[3]  David W. Murray,et al.  Directing the Attention of aWearable Camera by Pointing Gestures , 2006, 2006 19th Brazilian Symposium on Computer Graphics and Image Processing.

[4]  G. Klein,et al.  Parallel Tracking and Mapping for Small AR Workspaces , 2007, 2007 6th IEEE and ACM International Symposium on Mixed and Augmented Reality.

[5]  Francis K. H. Quek Unencumbered Gestural Interaction , 1996, IEEE Multim..

[6]  Matthew Turk,et al.  Multimodal Human-Computer Interaction , 2005 .

[7]  David W. Murray,et al.  Video-rate localization in multiple maps for wearable augmented reality , 2008, 2008 12th IEEE International Symposium on Wearable Computers.

[8]  Stefan Kopp,et al.  Towards integrated microplanning of language and iconic gesture for multimodal output , 2004, ICMI '04.

[9]  Didier Perroud,et al.  Framework for Development of a Smart Environment: Conception and Use of the NAIF Framework , 2011, 2011 11th Annual International Conference on New Technologies of Distributed Systems.

[10]  Berthold K. P. Horn,et al.  Determining Optical Flow , 1981, Other Conferences.

[11]  David W. Murray,et al.  Object recognition and localization while tracking and mapping , 2009, 2009 8th IEEE International Symposium on Mixed and Augmented Reality.

[12]  Richard A. Bolt,et al.  “Put-that-there”: Voice and gesture at the graphics interface , 1980, SIGGRAPH '80.

[13]  Michael J. Swain,et al.  Gesture recognition using the Perseus architecture , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[14]  J. B. Brooke,et al.  SUS: A 'Quick and Dirty' Usability Scale , 1996 .

[15]  Ann Blandford,et al.  Four easy pieces for assessing the usability of multimodal interaction: the CARE properties , 1995, INTERACT.

[16]  Diane J. Cook,et al.  How smart are our environments? An updated look at the state of the art , 2007, Pervasive Mob. Comput..

[17]  Nicu Sebe,et al.  Multimodal Human Computer Interaction: A Survey , 2005, ICCV-HCI.

[18]  I. Scott MacKenzie,et al.  Extending Fitts' law to two-dimensional tasks , 1992, CHI.

[19]  S. Mitra,et al.  Gesture Recognition: A Survey , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[20]  Richard A. Bolt,et al.  Two-handed gesture in multi-modal natural dialog , 1992, UIST '92.

[21]  Joëlle Coutaz,et al.  A design space for multimodal systems: concurrent processing and data fusion , 1993, INTERCHI.

[22]  Nebojsa Jojic,et al.  Detection and estimation of pointing gestures in dense disparity maps , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[23]  Rainer Stiefelhagen,et al.  Visual recognition of pointing gestures for human-robot interaction , 2007, Image Vis. Comput..

[24]  Roberto Cipolla,et al.  Human-robot interface by pointing with uncalibrated stereo vision , 1996, Image Vis. Comput..