PRONUNCIATION MODELING FOR SPONTANEOUS SPEECH BY MAXIMIZING WORD CORRECT RATE IN A PRODUCTION- RECOGNITION MODEL

In this paper, we develop a new method for compiling a pronunciation dictionary to model pronunciation variation in spontaneous speech recognition. The pronunciation dictionary is assembled by iteratively selecting pronunciations from a datadriven word confusion table, based on directly maximizing the word correct rate simulated by a production-recognition model such that the optimal performance of recognition can be achieved. In other words, the compiled pronunciation dictionary can not only accommodate as many as necessary pronunciations but also avoid possible introduced confusion during recognition. The simulation of word correct rate is performed with a novel human-machine communication model, consisting of a human speech production module and a machine speech recognition module. Our experimental results on LDC Mandarin Call Home and Call Friend corpora showed that significant improvement is achieved with this new approach. Furthermore, the framework and theory presented here are applicable to other languages.