Experiments on recognition of lavalier microphone speech and whispered speech in real world environments

In this paper, we present corpora and recognition experiments of the speech recorded in everyday life for the real world speech recognition. A speech corpus of 8,600 sentences from 53 speakers recorded through lavalier microphones in four different environments is built. The data was collected in an office space, a sound-proof room, in cars of different sizes, and on the street. Another corpus consisting of whispered and normal speech of more than 6,000 sentences from 100 speakers recorded through a close-talking microphone is devised. Continuous speech recognition experiments using acoustic models trained by the speech corpus in each environment, attain a recognition accuracy of above 80%. For the whispered speech corpus, the recognition accuracy obtained was 74% using the whispered speech model.

[1]  Kazuya Takeda,et al.  Acoustic analysis and recognition of whispered speech , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Nobuaki Minematsu,et al.  Japanese Dictation Toolkit-1997 version- , 1999 .

[3]  Jean-Claude Junqua,et al.  Robustness in Automatic Speech Recognition , 1996 .

[4]  Shigeru Katagiri,et al.  A large-scale Japanese speech database , 1990, ICSLP.