The effect of speech and audio compression on speech recognition performance

This paper proposes an in-depth look at the influence of different speech and audio codecs on the performance of our continuous speech recognition engine. GSM full rate, G711, G723.1 and MPEG coders are investigated. It is shown that MPEG transcoding degrades the speech recognition performance for low bitrates whereas performance remains acceptable for specialized speech coders like GSM or G711. A new strategy is proposed to cope with degradation due to low bitrate coding. The acoustic models of the speech recognition system are trained with transcoded speech (one acoustic model for each speech/audio codec). First results show that this strategy allows one to recover acceptable performance.

[1]  Gianni Lazzari Spoken translation: challenges and opportunities , 2000, INTERSPEECH.

[2]  Maxine Eskénazi,et al.  BREF, a large vocabulary spoken corpus for French , 1991, EUROSPEECH.

[3]  José Rouillard,et al.  Internet Documents: A Rich Source for Spoken Language Modeling , 1999 .

[4]  Fausto Pellandini,et al.  GSM speech coding and speaker recognition , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[5]  Carolyn Penstein Rosé,et al.  Recent Advances in JANUS: A Speech Translation System , 1993, TMI.

[6]  Liang He,et al.  The study on distributed speech recognition system , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).