Use of real and contaminated speech for training of a hands-free in-car speech recognizer

A database of in-car speech for the Italian language was collected under the European projects SpeechDatCar and VODIS II. It consists of 600 sessions recorded under various noise and driving conditions and includes close-talk signa ls and far microphone signals for hands-free interaction. This paper describes some recognition experiments on two tasks conceived on a portion of this database: connected dig it sequences and isolated command words. Recognition rate achieved by means of HMMs trained on real in-car speech is compared with that accomplished by a speech contamination approach, which aims at simulating in-car data starting fro m a clean speech corpus. Recognition performance is also analyzed as a function of the different noise conditions and of the consequent SNR at t he far microphones. Finally, the effect of HMM adaptation is in vestigated in order to tune the recognizer on the conditions of the various sessions.

[1]  Maurizio Omologo,et al.  Training of HMM with filtered speech material for hands-free recognition , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[2]  Alexander Fischer,et al.  Database and online adaptation for improved speech recognition in car environments , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[3]  Maurizio Omologo,et al.  Speaker independent continuous speech recognition using an acoustic-phonetic Italian corpus , 1994, ICSLP.

[4]  Khalid Choukri,et al.  SPEECHDAT-CAR. A Large Speech Database for Automotive Environments , 2000, LREC.

[5]  Maurizio Omologo,et al.  Environmental conditions and acoustic transduction in hands-free speech recognition , 1998, Speech Commun..

[6]  Dieter Leckschat,et al.  Optimized second-order gradient microphone for hands-free speech recordings in cars , 2001, Speech Commun..

[7]  Philip C. Woodland,et al.  Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..

[8]  Maurizio Omologo,et al.  SOME RESULTS ON THE DEVELOPMENT OF A HANDS-FREE SPEECH RECOG NIZER FOR CAR-ENVIRONMENT , 1999 .

[9]  Maurizio Omologo,et al.  Annotation of a Multichannel Noisy Speech Corpus , 2000, LREC.

[10]  Yifan Gong,et al.  Speech-enabled information retrieval in the automobile environment , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).