Discriminative training for large vocabulary telephone-based name recognition

This paper describes progress on a commercial application of the MECS recognition system to the task of recognizing Japanese family names spoken by customers into the answering machines of a large marketing/human resource company. The task is thus speaker-independent, open vocabulary, and is characterized by large variation in caller speaking styles, telephone types and acoustic environments. Our results show that context-independent hidden Markov models trained discriminatively with the minimum classification error criterion are a practical alternative to context-dependent models based on phonetic decision trees, yielding better performance with a much smaller number of parameters. On this difficult task we have obtained 59% correct family name recognition. A phoneme-based confidence measure enables us to obtain 85% correct name recognition for accepted utterances, at an overall utterance acceptance rate of 15%.