Improving Multilingual Interaction for Consumer Robots through Signal Enhancement in Multichannel Speech

Social robotics have become a trend in contemporary robotics research, since they can be successfully used in a wide range of applications. One of the most fundamental communication skills a consumer robot must have is the oral interaction with a human, in order to provide feedback or accept commands. There are quite a few well established Automatic Speech Recognition (ASR) tools, however without providing efficient results, especially in less popular languages, and more importantly under noisy conditions. The current paper investigates different voice activity detection and noise elimination methodologies to be used with ASRbased oral interaction with an affordable budget robot, NAO v4. Acoustically semi-stationary environments are assumed, which in conjunction to the high background noise of the NAO's microphones make the ASR quite difficult to succeed. Full Title: Improving multilingual interaction for consumer robots through signal enhancement in multichannel speech Additional Information:

[1]  Alexander Fischer,et al.  Quantile based noise estimation for spectral subtraction and Wiener filtering , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[2]  George Kalliris,et al.  Novel wavelet domain Wiener filtering de-noising techniques: Application to bowel sounds captured by means of abdominal surface vibrations , 2006, Biomed. Signal Process. Control..

[3]  Wouter A. Dreschler,et al.  Analysis of individual preferences for tuning of noise-reduction algorithms , 2012 .

[4]  Christophe Ris,et al.  Assessing local noise level estimation methods: Application to noise robust ASR , 2000, Speech Commun..

[5]  Bozena Kostek Perception-Based Data Processing in Acoustics: Applications to Music Information Retrieval and Psychophysiology , 2005, Studies in Computational Intelligence.

[6]  Pericles A. Mitkas,et al.  RAPP System Architecture , 2014, IROS 2014.

[7]  Charalampos Dimoulas,et al.  Syncing Shared Multimedia through Audiovisual Bimodal Segmentation , 2015, IEEE MultiMedia.

[8]  George Kalliris,et al.  Automated audio detection, segmentation and indexing, with application to post-production editing , 2007 .

[9]  Andrzej Czyzewski,et al.  Real-Time Speech Signal Segmentation Methods , 2013 .

[10]  Fotios Talantzis,et al.  A Multimicrophone Voice Activity Detection System Based on Mutual Information , 2009 .

[11]  Philipos C. Loizou,et al.  A noise-estimation algorithm for highly non-stationary environments , 2006, Speech Commun..

[12]  Paul Lamere,et al.  Sphinx-4: a flexible open source framework for speech recognition , 2004 .

[13]  Alexander Lerch,et al.  An Introduction to Audio Content Analysis: Applications in Signal Processing and Music Informatics , 2012 .

[14]  Aicha Bouzid,et al.  Speech Enhancement Based on Wavelet Transform and Improved Subspace Decomposition , 2016 .

[15]  Pericles A. Mitkas,et al.  An automatic speech detection architecture for social robot oral interaction , 2015, AM '15.

[16]  Robert C. Maher,et al.  Methods for reducing audible artifacts in a wavelet-based broad-band denoising system , 1998 .

[17]  Taabish Gulzar,et al.  A Systematic Analysis of Automatic Speech Recognition: An Overview , 2014 .

[18]  John Mourjopoulos,et al.  Perceptual Filters for Audio Signal Enhancement , 1997 .

[19]  Monika Dörfler,et al.  Persistent Time-Frequency Shrinkage for Audio Denoising , 2013 .

[20]  C P Chan,et al.  Noisy speech recognition using de-noised multiresolution analysis acoustic features. , 2001, The Journal of the Acoustical Society of America.

[21]  Marc Moonen,et al.  The impact of speech detection errors on the noise reduction performance of multi-channel Wiener filtering and Generalized Sidelobe Cancellation , 2003, Signal Process..