Representation of acoustic and phonetic knowledge for speaker-independent recognition of small vocabularies

Abstract We present an environment and methodology for the representation and processing of acoustic, phonetic and lexical knowledge for speech recognition. The tools suggested enable the encoding and processing of numerical data (signals, parameters, shapes, etc.) and symbolic informations (words, phonemes, syllables, features, cues, etc.) to be carried out in a uniform, uninterrupted and dynamic manner. The application of this methodology is described with reference to a task involving the multi-speaker recognition of the names of the 26 letters of the alphabet given in French. Despite the widely acknowledged difficulty of this vocabulary, the results attained provide a clear validation of the approach, particularly in the case of acoustically very similar words.