In this paper, we propose to integrate an Environmental Sniffing [1] framework, into an in-vehicle hands-free digit recognition task. The framework of Environmental Sniffing is focused on detection, classification and tracking changing acoustic environments. Here, we extend the framework to detect and track acoustic environmental conditions in a noisy-speech audio stream. Knowledge extracted about the acoustic environmental conditions is used to determine which environment dependent acoustic model to use. Critical Performance Rate (CPR), previously considered in [1], is formulated and calculated for this task. The sniffing framework is compared to a ROVER solution for automatic speech recognition (ASR) using different noise conditioned recognizers in terms of Word Error Rate (WER) and CPU usage. Results show that the model matching scheme using the knowledge extracted from the audio stream by Environmental Sniffing does a better job than a ROVER solution both in accuracy and computation. A relative 11.1% WER improvement is achieved with a relative 75% reduction in CPU resources.
[1]
Pedro J. Moreno,et al.
Speech recognition in noisy environments
,
1996
.
[2]
John H. L. Hansen,et al.
Unsupervised audio stream segmentation and clustering via the Bayesian information criterion
,
2000,
INTERSPEECH.
[3]
John H. L. Hansen,et al.
"CU-move" : analysis & corpus development for interactive in-vehicle speech systems
,
2001,
INTERSPEECH.
[4]
Jonathan G. Fiscus,et al.
A post-processing system to yield reduced word error rates: Recognizer Output Voting Error Reduction (ROVER)
,
1997,
1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.
[5]
Yifan Gong,et al.
Speech recognition in noisy environments: A survey
,
1995,
Speech Commun..