Speech Technologies for Advanced Applications in Service Robotics

The multimodal interface for controlling functions of the complex modular robotic system, which can be deployed in difficult conditions as are rescue works, natural disasters, fires, decontamination purposes was designed. Such interface involves several fundamental technologies such as speech recognition, speech synthesis and dialogue management. To enable human operator to cooperate with designed robotic system, the sophisticated architecture was designed and described technologies were implemented. The automatic speech recognition system is introduced, which is based on Hidden Markov models and enables to control functions of the system using a set of voice commands. The text-to-speech system was prepared for producing feedback to the operator and dialogue manager technology was adopted, which makes it possible to perform the information exchange between operator and robotic system. The system proposed is enriched with acoustic event detection system, which consists of a set of five microphones integrated on the robotic vehicle, the post-processing unit and detection unit.

[1]  Martin Lojka,et al.  Performance of Basic Spectral Descriptors and MRMR Algorithm to the Detection of Acoustic Events , 2012, MCSS.

[2]  Brian Vaughan,et al.  Designing and Implementing a Platform for Collecting Multi-Modal Data of Human-Robot Interaction , 2012 .

[3]  Bruce A. MacDonald,et al.  Acceptance of Healthcare Robots for the Older Population: Review and Future Directions , 2009, Int. J. Soc. Robotics.

[4]  Goldie Nejat,et al.  Brian 2.1: A socially assistive robot for the elderly and cognitively impaired , 2013, IEEE Robotics & Automation Magazine.

[5]  P. Baranyi,et al.  Definition and synergies of cognitive infocommunications , 2012 .

[6]  Nikos Fakotakis,et al.  On acoustic surveillance of hazardous situations , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[7]  Jörn Anemüller,et al.  Audio Classification and Localization for Incongruent Event Detection , 2012, Detection and Identification of Rare Audiovisual Cues.

[8]  Ming Liu,et al.  HMM-Based Acoustic Event Detection with AdaBoost Feature Selection , 2007, CLEAR.

[9]  Steve Young,et al.  The HTK book version 3.4 , 2006 .

[10]  Khalid Choukri,et al.  SpeechDat(E) - Eastern European Telephone Speech Databases , 2000 .

[11]  H. Corriveau,et al.  A qualitative study of in-home robotic telepresence for home care of community-living elderly subjects , 2007, Journal of telemedicine and telecare.

[12]  Narada D. Warakagoda,et al.  The COST 249 SpeechDat Multilingual Reference Recogniser , 2000, LREC.

[13]  S. Ondas,et al.  Design and development of the Slovak multimodal dialogue system with the BML Realizer Elckerlyc , 2012, 2012 IEEE 3rd International Conference on Cognitive Infocommunications (CogInfoCom).

[14]  Heiga Zen,et al.  Recent development of the HMM-based speech synthesis system (HTS) , 2009 .

[15]  Michael F. McTear,et al.  Book Review: Spoken Dialogue Technology: Toward the Conversational User Interface, by Michael F. McTear , 2002, CL.

[16]  Thomas Sikora,et al.  MPEG-7 Audio and Beyond: Audio Content Indexing and Retrieval , 2005 .

[17]  Andrzej Czyzewski,et al.  Detection and localization of selected acoustic events in acoustic field for smart surveillance applications , 2011, Multimedia Tools and Applications.

[18]  Mohan S. Kankanhalli,et al.  Audio Based Event Detection for Multimedia Surveillance , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[19]  S. Nordholm,et al.  Investigation of Robust Features for Speech Recognition in Hostile Environments , 2005, 2005 Asia-Pacific Conference on Communications.

[20]  Martin Lojka,et al.  Comparison of Different Feature Types for Acoustic Event Detection System , 2013, MCSS.

[21]  Victor Zue,et al.  GALAXY-II: a reference architecture for conversational system development , 1998, ICSLP.

[22]  Jozef Vavrek,et al.  Broadcast news audio classification using SVM binary trees , 2012, 2012 35th International Conference on Telecommunications and Signal Processing (TSP).

[23]  Andrzej Czyzewski,et al.  Detection and localization of selected acoustic events in acoustic field for smart surveillance applications , 2012, Multimedia Tools and Applications.

[24]  Jozef Juhar,et al.  Service Robot SCORPIO with Robust Speech Interface , 2013 .

[25]  Sergios Theodoridis,et al.  Violence Content Classification Using Audio Features , 2006, SETN.

[26]  J. Juhar,et al.  Evaluating the modified viterbi decoder for long-term audio events monitoring task , 2012, Proceedings ELMAR-2012.

[27]  Jozef Juhár,et al.  Acoustic Events Detection Using MFCC and MPEG-7 Descriptors , 2011, MCSS.

[28]  T. Shibata,et al.  Robot Therapy: A New Approach for Mental Healthcare of the Elderly – A Mini-Review , 2010, Gerontology.