'Early recognition' of words in continuous speech

In this paper, we present an automatic speech recognition (ASR) system based on the combination of an automatic phone recogniser and a computational model of human speech recognition - SpeM - that is capable of computing 'word activations' during the recognition process, in addition to doing normal speech recognition, a task in which conventional ASR architectures only provide output after the end of an utterance. We explain the notion of word activation and show that it can be used for 'early recognition', i.e. recognising a word before the end of the word is available. Our ASR system was tested on 992 continuous speech utterances, each containing at least one target word: a city name of at least two syllables. The results show that early recognition was obtained for 72.8% of the target words that were recognised correctly. Also, it is shown that word activation can be used as an effective confidence measure.

[1]  D. Norris Shortlist: a connectionist model of continuous speech recognition , 1994, Cognition.

[2]  Fernando Pereira,et al.  Weighted finite-state transducers in speech recognition , 2002, Comput. Speech Lang..

[3]  W. Marslen-Wilson,et al.  The temporal structure of spoken language understanding , 1980, Cognition.

[4]  Gökhan Tür,et al.  Modeling the prosody of hidden events for improved word recognition , 1999, EUROSPEECH.

[5]  Louis Boves,et al.  Towards Ambient Intelligence: Multimodal Computers that understand our intentions , 2003 .

[6]  MohriMehryar,et al.  Weighted finite-state transducers in speech recognition , 2002 .

[7]  James R. Glass A probabilistic framework for segment-based speech recognition , 2003, Comput. Speech Lang..

[8]  J. M. de Veth On speech sound model accuracy , 2001 .

[9]  Hermann Ney,et al.  Confidence measures for large vocabulary continuous speech recognition , 2001, IEEE Trans. Speech Audio Process..

[10]  Lou Boves,et al.  A spoken dialog system for the Dutch public transport information service , 1997, Int. J. Speech Technol..

[11]  Hermann Ney,et al.  The Philips research system for large-vocabulary continuous-speech recognition , 1993, EUROSPEECH.

[12]  D. Norris,et al.  The Possible-Word Constraint in the Segmentation of Continuous Speech , 1997, Cognitive Psychology.

[13]  Louis ten Bosch,et al.  Recognising 'real-life' speech with spem: a speech-based computational model of human speech recognition , 2003, INTERSPEECH.