Using Semantics to Automatically Generate Speech Interfaces for Wearable Virtual and Augmented Reality Applications

This paper presents a framework for automatically generating speech-based interfaces for controlling virtual and augmented reality (AR) applications on wearable devices. Starting from a set of natural language descriptions of application functionalities and a catalog of general-purpose icons, annotated with possible implied meanings, the framework creates both vocabulary and grammar for the speech recognizer, as well as a graphic interface for the target application, where icons are expected to be capable of evoking available commands. To minimize user's cognitive load during interaction, a semantics-based optimization mechanism was used to find the best mapping between icons and functionalities and to expand the set of valid commands. The framework was evaluated by using it with see-through glasses for AR-based maintenance and repair operations. A set of experimental tests were designed to objectively and subjectively assess first-time user experience of the automatically generated interface in relation to that of a fully personalized interface. Moreover, intuitiveness of the automatically generated interface was studied by analyzing the results obtained through trained users on the same interface. Objective measurements (in terms of false positives, false negatives, task completion rate, and average number of attempts for activating functionalities) and subjective measurements (about system response accuracy, likeability, cognitive demand, annoyance, habitability, and speed) reveal that the results obtained by the first-time users and experienced users with the proposed framework's interface are very similar, and their performances are comparable with those of both the considered references.

[1]  Rajeev Sharma,et al.  Interactive evaluation of assembly sequences using augmented reality , 1999, IEEE Trans. Robotics Autom..

[2]  Rita Cucchiara,et al.  Hand segmentation for gesture recognition in EGO-vision , 2013, IMMPD '13.

[3]  Nicole Yankelovich,et al.  How do users know what to say? , 1996, INTR.

[4]  Steve Caplin,et al.  Icon Design: Graphic Icons in Computer Interface Design , 2001 .

[5]  Anoop K. Sinha,et al.  Suede: a Wizard of Oz prototyping tool for speech user interfaces , 2000, UIST '00.

[6]  William C. Howell,et al.  Population stereotypy in code design , 1968 .

[7]  Jerome White,et al.  Speech-interface prompt design: lessons from the field , 2015, ICTD.

[8]  J. Woodard,et al.  Selected military applications of automatic speech recognition technology , 1983, IEEE Communications Magazine.

[9]  Andrew Sears,et al.  Layout Appropriateness: A Metric for Evaluating User Interface Widget Layout , 1993, IEEE Trans. Software Eng..

[10]  Li Wei,et al.  Intelligent Icons: Integrating Lite-Weight Data Mining and Visualization into GUI Operating Systems , 2006, Sixth International Conference on Data Mining (ICDM'06).

[11]  Mark Billinghurst,et al.  Augmented Reality in the Classroom , 2012, Computer.

[12]  Stephan Gamm,et al.  User interface design of voice controlled consumer electronics , 1995 .

[13]  Martin Hachet,et al.  A Survey of Interaction Techniques for Interactive 3D Environments , 2013, Eurographics.

[14]  Catalina Danis,et al.  Storywriter: a speech oriented editor , 1994, CHI '94.

[15]  Richard E. Ladner,et al.  Webanywhere: enabling a screen reading interface for the web on any computer , 2008, WWW.

[16]  Khaled F. Hussain,et al.  Augmented Reality Experiment: Drivers' Behavior at an Unsignalized Intersection , 2013, IEEE Transactions on Intelligent Transportation Systems.

[17]  D. Norman,et al.  User Centered System Design: New Perspectives on Human-Computer Interaction , 1988 .

[18]  Heedong Ko,et al.  An Evaluation of an Augmented Reality Multimodal Interface Using Speech and Paddle Gestures , 2006, ICAT.

[19]  Chao Lu,et al.  Fusion of ICA Spatial, Temporal and Localized Features for Face Recognition , 2007 .

[20]  Sarah Sharples,et al.  Developing speech input for virtual reality applications: A reality based interaction approach , 2011, Int. J. Hum. Comput. Stud..

[21]  Javier Ferreiros,et al.  A speech interface for air traffic control terminals , 2012 .

[22]  Henry Been-Lirn Duh,et al.  Interacting with Augmented Reality: How Does Location-Based AR Enhance Learning? , 2013, INTERACT.

[23]  Anoop K. Sinha,et al.  SUEDE: iterative, informal prototyping for speech interfaces , 2001, CHI Extended Abstracts.

[24]  Robert Graham,et al.  Towards a tool for the Subjective Assessment of Speech System Interfaces (SASSI) , 2000, Natural Language Engineering.

[25]  Andrea Sanna,et al.  Extensible GUIs for Remote Application Control on Mobile Devices , 2008, IEEE Computer Graphics and Applications.

[26]  Wolfgang Broll,et al.  Toward Next-Gen Mobile AR Games , 2008, IEEE Computer Graphics and Applications.

[27]  Elizabeth A. Heard Symbol Study - 1972 , 1974 .

[28]  Robert W. Lindeman,et al.  Towards usable VR: an empirical study of user interfaces for immersive virtual environments , 1999, CHI '99.

[29]  Hirokazu Kato,et al.  Advanced Interaction Techniques for Augmented Reality Applications , 2009, HCI.

[30]  Luca Benini,et al.  Gesture Recognition Using Wearable Vision Sensors to Enhance Visitors’ Museum Experiences , 2015, IEEE Sensors Journal.

[31]  Markus Ehmann,et al.  Evaluating customer expectance of mixed reality applications in order picking , 2013, UbiComp.

[32]  Anselm Grundhöfer,et al.  Projection-Based Augmented Reality in Disney Theme Parks , 2012, Computer.

[33]  Takayuki Itoh,et al.  MIST : A Music Icon Selection Technique Using Neural Network , 2007 .

[34]  Tobias Höllerer,et al.  Multimodal interaction with a wearable augmented reality system , 2006, IEEE Computer Graphics and Applications.

[35]  John Abeysekera,et al.  Understanding small graphical symbols: a cross-cultural study , 2001 .

[36]  Krzysztof Z. Gajos,et al.  Automatically generating user interfaces adapted to users' motor and vision capabilities , 2007, UIST.

[37]  Peter Kay Speech-driven graphics: a user interface , 1993 .

[38]  Fadi Biadsy,et al.  JustSpeak: enabling universal voice control on Android , 2014, W4A.

[39]  Ronald Rosenfeld,et al.  Keywords for a universal speech interface , 2002, CHI Extended Abstracts.

[40]  T M Peters,et al.  Virtual and Augmented Medical Imaging Environments: Enabling Technology for Minimally Invasive Cardiac Interventional Guidance , 2010, IEEE Reviews in Biomedical Engineering.

[41]  Sonja Zillner,et al.  Towards Medical Cyber-Physical Systems: Multimodal Augmented Reality for Doctors and Knowledge Discovery about Patients , 2013, HCI.

[42]  Mark Billinghurst,et al.  Applying HCI principles to AR systems design , 2007 .

[43]  Federico Manuri,et al.  A Flexible AR-based Training System for Industrial Maintenance , 2015, AVR.

[44]  Sven Blankenberger,et al.  Effects of Icon Design on Human-Computer Interaction , 1991, Int. J. Man Mach. Stud..

[45]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[46]  Vidya Setlur,et al.  Automatic generation of semantic icon encodings for visualizations , 2014, CHI.

[47]  Antonio J. Nebro,et al.  jMetal: A Java framework for multi-objective optimization , 2011, Adv. Eng. Softw..

[48]  B. Schneirdeman,et al.  Designing the User Interface: Strategies for Effective Human-Computer Interaction , 1998 .

[49]  Takanori Mori,et al.  Automatic GUI generation on AV remote control using genetic algorithm , 2010, IEEE International Symposium on Consumer Electronics (ISCE 2010).

[50]  Horst Bischof,et al.  Augmented Reality for Construction Site Monitoring and Documentation , 2014, Proceedings of the IEEE.

[51]  Gina-Anne Levow,et al.  Designing SpeechActs: issues in speech user interfaces , 1995, CHI '95.

[52]  Xudong Lu,et al.  User Interface Design Model , 2007, SNPD.

[53]  Shumin Zhai,et al.  The benefits of augmenting telephone voice menu navigation with visual browsing and search , 2006, CHI.