Audio-visual speech recognition techniques in augmented reality environments

Many recent studies show that Augmented Reality (AR) and Automatic Speech Recognition (ASR) technologies can be used to help people with disabilities. Many of these studies have been performed only in their specialized field. Audio-Visual Speech Recognition (AVSR) is one of the advances in ASR technology that combines audio, video, and facial expressions to capture a narrator’s voice. In this paper, we combine AR and AVSR technologies to make a new system to help deaf and hard-of-hearing people. Our proposed system can take a narrator’s speech instantly and convert it into a readable text and show the text directly on an AR display. Therefore, in this system, deaf people can read the narrator’s speech easily. In addition, people do not need to learn sign-language to communicate with deaf people. The evaluation results show that this system has lower word error rate compared to ASR and VSR in different noisy conditions. Furthermore, the results of using AVSR techniques show that the recognition accuracy of the system has been improved in noisy places. Also, the results of a survey that was conducted with 100 deaf people show that more than 80 % of deaf people are very interested in using our system as an assistant in portable devices to communicate with people.

[1]  G. A. Giraldi,et al.  Introduction to Augmented Reality , 2003 .

[2]  B. Ben Mosbah Speech Recognition for Disabilities People , 2006, 2006 2nd International Conference on Information & Communication Technologies.

[3]  Juergen Luettin,et al.  Audio-Visual Speech Modeling for Continuous Speech Recognition , 2000, IEEE Trans. Multim..

[4]  Ara V. Nefian,et al.  Audio-visual continuous speech recognition using a coupled hidden Markov model , 2002, INTERSPEECH.

[5]  Alex Zelinsky,et al.  Learning OpenCV---Computer Vision with the OpenCV Library (Bradski, G.R. et al.; 2008)[On the Shelf] , 2009, IEEE Robotics & Automation Magazine.

[6]  Mark Fiala,et al.  Augmented Reality: A Practical Guide , 2008 .

[7]  Dieter Schmalstieg,et al.  Experiences with Handheld Augmented Reality , 2007, 2007 6th IEEE and ACM International Symposium on Mixed and Augmented Reality.

[8]  Anthony E. Cawkell,et al.  Understanding Virtual Reality , 2003, J. Documentation.

[9]  Hala H. Zayed,et al.  ARSC: Augmented Reality Student Card , 2010, 2010 International Computer Engineering Conference (ICENCO).

[10]  F. Valero-Cuevas,et al.  The potential of virtual reality and gaming to assist successful aging with disability. , 2010, Physical medicine and rehabilitation clinics of North America.

[11]  R. San-Segundo,et al.  Evaluating a Speech Communication System for Deaf People , 2011, IEEE Latin America Transactions.

[12]  Sigal Eden,et al.  Improving Flexible Thinking in Deaf and Hard of Hearing Children with Virtual Reality Technology , 2000, American annals of the deaf.

[13]  Hirokazu Kato,et al.  Introduction to Augmented Reality , 2012 .

[14]  Halimah Badioze Zaman,et al.  Developing Augmented Reality book for deaf in science: The determining factors , 2010, 2010 International Symposium on Information Technology.

[15]  Sridha Sridharan,et al.  Lip detection for audio-visual speech recognition in-car environment , 2010, 10th International Conference on Information Science, Signal Processing and their Applications (ISSPA 2010).

[16]  Ivo Ipsic Speech and Language Technologies , 2011 .

[17]  Satoshi Tamura,et al.  Evaluation of real-time audio-visual speech recognition , 2010, AVSP.

[18]  John D. Kelleher,et al.  Just Say It: An Evaluation of Speech Interfaces for Augmented Reality Design Applications , 2009, AICS.

[19]  Jong Kyoung Kim,et al.  Speech recognition , 1983, 1983 IEEE International Solid-State Circuits Conference. Digest of Technical Papers.

[20]  Roger Braunstein ActionScript 3.0 Bible , 2007 .

[21]  Kah Phooi Seng,et al.  Lips detection for audio-visual speech recognition system , 2009, 2008 International Symposium on Intelligent Signal Processing and Communications Systems.

[22]  Steven K. Feiner,et al.  Mutual disambiguation of 3D multimodal interaction in augmented and virtual reality , 2003, ICMI '03.

[23]  Wayne H. Ward,et al.  Speech recognition , 1997 .

[24]  Ben Blachnitzky,et al.  Augmented reality , 2012, 2012 IEEE Hot Chips 24 Symposium (HCS).

[25]  Wolfgang Höhl Interactive Environments with Open-Source Software: 3D Walkthroughs and Augmented Reality for Architects with Blender 2.43, DART 3.0 and ARToolKit 2.72 , 2008 .

[26]  Heedong Ko,et al.  "Move the couch where?" : developing an augmented reality multimodal interface , 2006, 2006 IEEE/ACM International Symposium on Mixed and Augmented Reality.