Genericity and Adaptability Issues for Task-Independent Speech Recognition

The last decade has witnessed major advances in core speech recognition technology,with today’s systems able to recognize continuous speech from many speakers without the need for an explicit enrollment procedure. Despite these improvements, speech recognition is far from being a solved problem. Most recognition systems are tuned to a particular task and porting the system to another task or language is both time-consuming and expensive. Our recent work addresses issues in speech recognizer portability, with the goal of developing generic core speech recognition technology. In this paper, we first assess the genericity of wide domain models by evaluating performance on several tasks. Then, transparent methods are used to adapt generic acoustic and language models to a specific task. Unsupervised acoustic models adaptation is contrasted with supervised adaptation, and a systemin-loop scheme for incremental unsupervised acoustic and linguistic models adaptation is investigated. Experiments on a spontaneous dialog task show that with the proposed scheme, a transparently adapted generic system can perform nearly as well (about a 1% absolute gap in word error rates) as a task-specific system trained on several tens of hours of manually transcribed data.