Using Information on Lexical Stress for Utterance Verification

ASR applications like nationwide telephone directory assis­ tance (DA) face the challenge of making a correct classifica­ tion with only minimal amounts of acoustic data. For this reason, current systems still make too many errors in order to be useful. In the perspective of the idea that ‘no recognition’ is better than ‘misrecognition’, a feasible system should there­ fore detect and reject the least reliable hypotheses. This proc­ ess is known as utterance verification. Against the disadvantage of having few information, there is the advantage that isolated utterances have a relatively small degree of prosodic variation, for instance in intonation, speech rate and accent. In this paper we investigate how one can capitalise on this advantage in terms of better utterance verification. We define a number of confidence measures (CMs) on prosodic features and evaluate several linear com­ binations of one or more CMs. Experimental results on a field corpus of city names show that a relative improvement of 11.0% Confidence Error Rate can be achieved when compared to a ‘conventional’ system with only a Log Likelihood Ratio CM.