Speaker-independent recognition of spoken English letters

A description is presented of EAR, an English alphabet recognizer that performs speaker-independent recognition of letters spoken in isolation. During recognition, (a) signal processing routines transform the digitized speech into useful representations, (b) rules are applied to the representations to locate segment boundaries, (c) feature measurements are computed on the speech segments, and (d) a neural network uses the feature measurements to classify the letter. The system was trained on one token of each letter from 120 speakers. Performance was 95% when tested on a new set of 30 speakers. Performance was 96% when tested on a second token of each letter from the original 120 speakers. The recognition accuracy is 6% higher than that of previously reported systems. The high level of performance is attributed to accurate and explicit phonetic segmentation, the use of speech knowledge to select features that measure the important linguistic information, and the ability of the neural classifier to model the variability of the data