On an application of phonological knowledge in automatic speech recognition

Abstract Automatic speech recognition based on segments is preferred for its flexibility and power, despite the difficulty of extracting segmental information from the acoustic speech stream. The paper argues that successful segment labeling will not bring all the expected benefits, and that some higher-level knowledge is needed. An implementation of phonological knowledge, using finite state transducers (FSTs) and based on the “two-level” morphological parser of Koskenniemi [Koskenniemi, K. (1983). Two-level morphology. Texas Linguistic Forum, 22 , 1–167] and Ritchie et al. [Ritchie et al. (1987). The Edinburgh-Cambridge Morphological Analyser and Dictionary System (Prototype: Version 2.4) User Manual . Cambridge University Computer Manual] is introduced. This provides a means of displaying the advantages and remaining difficulties of phonology in an ASR system. A disadvantage of this approach is the over-generation of hypothesized lexical strings in response to a given input, and the remainder of the paper investigates ways of restricting this overgeneration. Methods involve both further recourse to the acoustic signal (detection of the speaker's accent) and the invocation of more higher-level (morphological and syntactic) knowledge.