Differences between acoustic characteristics of spontaneous and read speech and their effects on speech recognition performance

Although speech derived from read texts, news broadcasts, and other similar prepared contexts can be recognized with high accuracy, recognition performance drastically decreases for spontaneous speech. This is due to the fact that spontaneous speech and read speech are significantly different acoustically as well as linguistically. This paper statistically and quantitatively analyzes differences in acoustic features between spontaneous and read speech using two large-scale speech corpora, ''Corpus of Spontaneous Japanese (CSJ)'' and ''Japanese Newspaper Article Sentences (JNAS)''. Experimental results show that spontaneous speech can be characterized by reduced spectral space in comparison with that of read speech, and that the more spontaneous, the more the spectral space shrinks. This paper also clarifies that reduction in the spectral space leads to reduction in phoneme recognition accuracy. This result indicates that spectral reduction is one major reason for the decrease of recognition accuracy in spontaneous speech.

[1]  Sadaoki Furui,et al.  Analysis on individual differences in automatic transcription of spontaneous presentations , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Shuichi Itahashi,et al.  JNAS: Japanese speech corpus for large vocabulary continuous speech recognition research , 1999 .

[3]  James R. Glass,et al.  Automatic processing of audio lectures for information retrieval: vocabulary selection and language modeling , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[4]  Wu Chou,et al.  Pattern Recognition in Speech and Language Processing , 2002 .

[5]  John J. Godfrey,et al.  SWITCHBOARD: telephone speech corpus for research and development , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6]  John S. Garofolo,et al.  THE RICH TRANSCRIPTION 2004 SPRING MEETING RECOGNITION EVALUATION , 2004 .

[7]  Louis C. W. Pols,et al.  An acoustic description of consonant reduction , 1999, Speech Commun..

[8]  K. Maekawa CORPUS OF SPONTANEOUS JAPANESE : ITS DESIGN AND EVALUATION , 2003 .

[9]  Louis C. W. Pols,et al.  Perisegmental speech improves consonant and vowel identification , 1999, Speech Communication.