I n recent years, significant progress has been made in advancing speech recognition technology, making speech an effective modality in both telephony and multimodal human-machine interaction. Speech recognition systems have been built and deployed for numerous applications. The technology is not only improving at a steady pace, but is also becoming increasingly usable and useful. However, speech recognition technology has not been widely accepted in our society. The current use of speech recognition by enterprises and consumers reflects only a tip of the iceberg of the full power that the technology could potentially offer [3]. To realize such a potential, the industry has yet to bridge the gap between what people want from speech recognition, typically in a multimodal environment, and what the technology can deliver. To make the mainstream use of speech recognition a reality, the industry must deliver robust and high-recognition accuracy close to human-like Although progress has been impressive, there are still several hurdles that speech recognition technology must clear before ubiquitous adoption can be realized. R&D in spontaneous and free-flowing speech style is critical to its success. BY Li Deng and Xuedong Huang
[1]
David Pearce,et al.
The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions
,
2000,
INTERSPEECH.
[2]
Sharon L. Oviatt,et al.
Breaking the Robustness Barrier: Recent Progress on the Design of Robust Multimodal Systems
,
2002,
Adv. Comput..
[3]
Sadaoki Furui,et al.
Recent progress in spontaneous speech recognition and understanding
,
2002,
2002 IEEE Workshop on Multimedia Signal Processing..
[4]
Xuedong Huang,et al.
Air- and bone-conductive integrated microphones for robust speech detection and enhancement
,
2003,
2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).
[5]
Alex Acero,et al.
Spoken Language Processing: A Guide to Theory, Algorithm and System Development
,
2001
.
[6]
Li Deng,et al.
Distributed speech processing in miPad's multimodal user interface
,
2002,
IEEE Trans. Speech Audio Process..
[7]
Benoît Maison,et al.
Perceptual interfaces for information interaction: joint processing of audio and visual information for human-computer interaction
,
2000,
INTERSPEECH.