Survey on Speech, Machine Translation and Gestures in Ambient Assisted Living.

In this paper we provide the state-of-the-art of existing proprietary and free and open source software (FOSS) automatic speech recognition (ASR), speech synthesizers, and Machine Translation (MT) tools. We also focus on the need for multimodal communication including gestures, furnishing some examples of 3D gesture recognition software. Our current experiment is based on interoperability between FOSS ASR, MT, and text-to-speech applications, while future experiments will include gesture recognition tools. Our application environment is an ambient assisted living lab at the University of Bremen, suitable for the elderly and/or people with impairments. In a nutshell, our goal is to provide a single uniform multimodal interface combining FOSS speech processing, MT, and gesture recognition tools for people in need.

[1]  Mikko Kurimo,et al.  Personalising Speech-To-Speech Translation in the EMIME Project , 2010, ACL.

[2]  Angela N. Burda,et al.  Perception of Accented Speech by Residents in Assisted-Living Facilities , 2005 .

[3]  Eric Becker,et al.  Event-based experiments in an assistive environment using wireless sensor networks and voice recognition , 2009, PETRA '09.

[4]  M. Forcada Open-source machine translation : an opportunity for minor languages , 2006 .

[5]  Jan Alexandersson,et al.  A Gesture Based System for Context – Sensitive Interaction with Smart Homes , 2011 .

[6]  Richard M. Schwartz,et al.  Incremental Hypothesis Alignment with Flexible Matching for Building Confusion Networks: BBN System Description for WMT09 System Combination Task , 2009, WMT@EACL.

[7]  S. Khudanpur,et al.  Sequential system combination for machine translation of speech , 2008, 2008 IEEE Spoken Language Technology Workshop.

[8]  Sharon O'Brien,et al.  Eye tracking in translation process research: methodological challenges and solutions , 2009 .

[9]  Alex Waibel,et al.  Document Driven Machine Translation Enhanced Automatic Speech Recognition , 2005 .

[10]  M. Haage,et al.  A prototype robot speech interface with multimodal feedback , 2002, Proceedings. 11th IEEE International Workshop on Robot and Human Interactive Communication.

[11]  Andy Way,et al.  OpenMaTrEx: A Free/Open-Source Marker-Driven Example-Based Machine Translation System , 2010, IceTAL.

[12]  G. Mcnicoll World Population Ageing 1950-2050. , 2002 .

[13]  Richard C. Rose,et al.  Integration of ASR and machine translation models in a document translation task , 2007, INTERSPEECH.

[14]  Makoto Nagao,et al.  A framework of a mechanical translation between Japanese and English by analogy principle , 1984 .

[15]  Robert L. Mercer,et al.  Automatic speech recognition in machine-aided translation , 1994, Comput. Speech Lang..

[16]  Hermann Ney,et al.  Integration of Speech to Computer-Assisted Translation Using Finite-State Automata , 2006, ACL.

[17]  Patrizia Grifoni,et al.  A multimodal pervasive framework for ambient assisted living , 2009, PETRA '09.

[18]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[19]  Elisabeth André,et al.  Wave like an Egyptian: accelerometer based gesture recognition for culture specific interactions , 2008, BCS HCI.

[20]  John R. Pierce,et al.  Language and Machines: Computers in Translation and Linguistics , 1966 .

[21]  Hossein Motallebipour,et al.  A Spoken Dialogue System to Control Robots , 2002 .

[22]  Richard M. Schwartz,et al.  Incremental Hypothesis Alignment for Building Confusion Networks with Application to Machine Translation System Combination , 2008, WMT@ACL.

[23]  Thomas Röfer,et al.  Mobility Assistance in the Bremen Ambient Assisted Living Lab , 2010 .

[24]  Christoph Bartneck,et al.  Towards the Design and Evaluation of ROILA: A Speech Recognition Friendly Artificial Language , 2010, IceTAL.

[25]  Paul Lamere,et al.  Sphinx-4: a flexible open source framework for speech recognition , 2004 .

[26]  Matthias Rehm,et al.  Wave like an Egyptian: accelerometer based gesture recognition for culture specific interactions , 2008 .

[27]  Philipp Koehn,et al.  A Web-Based Interactive Computer Aided Translation Tool , 2009, ACL.

[28]  Virginia Teller Review of Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition by Daniel Jurafsky and James H. Martin. Prentice Hall 2000. , 2000 .

[29]  C. Creider Hand and Mind: What Gestures Reveal about Thought , 1994 .

[30]  Raimund Dachselt,et al.  Natural throw and tilt interaction between mobile phones and distant displays , 2009, CHI Extended Abstracts.

[31]  Arturo Espinosa-Romero,et al.  Talking to Godot: dialogue with a mobile robot , 2002, IEEE/RSJ International Conference on Intelligent Robots and Systems.

[32]  Rainer Stiefelhagen,et al.  Visual recognition of pointing gestures for human-robot interaction , 2007, Image Vis. Comput..

[33]  Tatsuya Kawahara,et al.  Recent Development of Open-Source Speech Recognition Engine Julius , 2009 .

[34]  Niko Moritz,et al.  Acoustic user interfaces for ambient-assisted living technologies , 2010, Informatics for health & social care.

[35]  Mark J. F. Gales,et al.  Speech Recognition System Combination for Machine Translation , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[36]  Dimitra Anastasiou,et al.  Translating Vital Information : Localisation , Internationalisation , and Globalisation , 2010 .

[37]  James H. Martin,et al.  Speech and language processing: an introduction to natural language processing , 2000 .

[38]  Jean-François Lapointe,et al.  Evaluating productivity gains of hybrid ASR-MT systems for translation dictation , 2008, IWSLT.