Exploiting alternative acoustic sensors for improved noise robustness in speech communication

Abstract This study investigates the use of non-conventional body-conductive acoustic sensors in human-human speech communication and automatic speech recognition. The body-conductive sensors are directly attached to the speaker and receive the uttered speech through the skin and bones, resulting in higher robustness against environmental noise. In this study, a throat microphone, an ear bone microphone, and a standard microphone were evaluated using subjective speech intelligibility tests and automatic speech recognition experiments. In addition to the use of these sensors on their own, several methods were also applied for sensor integration, thereby achieving higher recognition rates. Namely, multi-stream hidden Markov model (HMM) decision fusion, and late fusion methods were used to integrate several sensors. By using late fusion, a 40% relative recognition rate improvement in a noisy environment, and a 24% relative recognition rate improvement in a clean environment were achieved. In the case of late fusion, a novel adaptive weighting method was introduced that does not require any pre-adjustment of the weights. In this study, a technique to automatically segment noisy speech data by using a body-conductive sensor in conjunction with the desired microphone during recording is presented. The Lombard effect phenomenon when using body-conductive acoustic sensors was also investigated.

[1]  Kiyohiro Shikano,et al.  Unvoiced Speech Recognition Using Tissue-Conductive Acoustic Sensor , 2007, EURASIP J. Adv. Signal Process..

[2]  Zicheng Liu,et al.  Multi-sensory microphones for robust speech detection, enhancement and recognition , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  Norihiro Hagita,et al.  Fusion of standard and alternative acoustic sensors for robust automatic speech recognition , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[4]  Tomi Kinnunen,et al.  Robust Speaker Recognition with Combined Use of Acoustic and Throat Microphone Speech , 2016, INTERSPEECH.

[5]  Douglas G Altman,et al.  Guidelines for the design and statistical analysis of experiments using laboratory animals. , 2002, ILAR journal.

[6]  Zicheng Liu,et al.  Multisensory processing for speech enhancement and magnitude-normalized spectra for speech modeling , 2008, Speech Commun..

[7]  Jérémie Voix,et al.  In-ear microphone speech quality enhancement via adaptive filtering and artificial bandwidth extension. , 2017, The Journal of the Acoustical Society of America.

[8]  Tomoki Toda,et al.  Silent-speech enhancement using body-conducted vocal-tract resonance signals , 2010, Speech Commun..

[9]  Werner Verhelst,et al.  Improved speech recognition in noisy environments by using a throat microphone for accurate voicing detection , 2010, 2010 18th European Signal Processing Conference.

[10]  Joan Fisher Box,et al.  Guinness, Gosset, Fisher, and Small Samples , 1987 .

[11]  Kevin P. Murphy,et al.  A coupled HMM for audio-visual speech recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[12]  E. A. Martin,et al.  Multi-style training for robust isolated-word speech recognition , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[13]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[14]  Werner Verhelst,et al.  Body Conducted Speech Enhancement by Equalization and Signal Fusion , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[15]  Trym Holter,et al.  On the feasibility of ASR in extreme noise using the PARAT earplug communication terminal , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[16]  William D Voiers,et al.  Research on Diagnostic Evaluation of Speech Intelligibility , 1973 .

[17]  A. Adjoudani,et al.  On the Integration of Auditory and Visual Parameters in an HMM-based ASR , 1996 .

[18]  John H. L. Hansen,et al.  Analysis and compensation of speech under stress and noise for environmental robustness in speech recognition , 1996, Speech Commun..

[19]  Chalapathy Neti,et al.  Recent advances in the automatic recognition of audiovisual speech , 2003, Proc. IEEE.

[20]  S. Ishimitsu,et al.  Sound quality improvement for the body-conducted speech of a sentence unit using differential acceleration , 2012, 2012 ICME International Conference on Complex Medical Engineering (CME).

[21]  Tanja Schultz,et al.  Adaptation for soft whisper recognition using a throat microphone , 2004, INTERSPEECH.