Adapting Speech Recognition in Augmented Reality for Mobile Devices in Outdoor Environments

This paper describes the process of integrating automatic speech recognition (ASR) into a mobile application and explores the benefits and challenges of integrating speech with augmented reality (AR) in outdoor environments. The augmented reality allows end-users to interact with the information displayed and perform tasks, while increasing the user’s perception about the real world by adding virtual information to it. Speech is the most natural way of communication: it allows hands-free interaction and may allow end-users to quickly and easily access a range of features available. Speech recognition technology is often available in most of the current mobile devices, but it often uses Internet to receive the corresponding transcript from remote servers, e.g., Google speech recognition. However, in some outdoor environments, Internet is not always available or may be offered at poor quality. We integrated an off-line automatic speech recognition module into an AR application for outdoor usage that does not require Internet. Currently, speech interaction is used within the application to access five different features, namely: to take a photo, shoot a film, communicate, messaging related tasks, and to request information, either geographic, biometric, or climatic. The application makes available solutions to manage and interact with the mobile device, offering good usability. We have compared the online and off-line speech recognition systems in order to assess their adequacy to the tasks. Both systems were tested under different conditions, commonly found in outdoor environments, such as: Internet access quality, presence of noise, and distractions.

[1]  Sérgio Guerreiro,et al.  Information Overload in Augmented Reality: The Outdoor Sports Environments , 2017 .

[2]  David Bawden,et al.  The dark side of information: overload, anxiety and other paradoxes and pathologies , 2009, J. Inf. Sci..

[3]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[4]  Pedro J. Moreno,et al.  A Real-Time End-to-End Multilingual Speech Recognition Architecture , 2015, IEEE Journal of Selected Topics in Signal Processing.

[5]  Alan B. Craig Understanding Augmented Reality: Concepts and Applications , 2013 .

[6]  Gregory Grefenstette,et al.  Regular expressions for language engineering , 1996, Natural Language Engineering.

[7]  Ben D. Sawyer,et al.  Google Glass , 2014, Hum. Factors.

[8]  Alexander I. Rudnicky,et al.  Pocketsphinx: A Free, Real-Time Continuous Speech Recognition System for Hand-Held Devices , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[9]  Steven K. Feiner,et al.  Mutual disambiguation of 3D multimodal interaction in augmented and virtual reality , 2003, ICMI '03.

[10]  Paul Lamere,et al.  Design of the CMU Sphinx-4 Decoder , 2022 .

[11]  Petra Holtzmann Visual Intelligence Perception Image And Manipulation In Visual Communication , 2016 .

[12]  Ronald T. Azuma,et al.  The Most Important Challenge Facing Augmented Reality , 2016, PRESENCE: Teleoperators and Virtual Environments.

[13]  Manfred K. Warmuth,et al.  THE CMU SPHINX-4 SPEECH RECOGNITION SYSTEM , 2001 .

[14]  Henry Been-Lirn Duh,et al.  Trends in augmented reality tracking, interaction and display: A review of ten years of ISMAR , 2008, 2008 7th IEEE/ACM International Symposium on Mixed and Augmented Reality.

[15]  Paul Lamere,et al.  Sphinx-4: a flexible open source framework for speech recognition , 2004 .

[16]  Ronald Azuma,et al.  The Challenge of Making Augmented Reality Work Outdoors , 1999 .

[17]  Tina Harrison,et al.  Augmented Reality Apparel: An Appraisal of Consumer Knowledge, Attitude and Behavioral Intentions , 2016, 2016 49th Hawaii International Conference on System Sciences (HICSS).