Speaker‐independent recognition of English letters

An isolated word recognition system is described which classifies English letters using features extracted from speech. Speech was sampled at 16 000 Hz, and pitch, endpoint detection, and location of vowel onset and offset were computed directly from the digitized data. A 128‐point DFT was performed every 3 ms using a 20‐ms window, and frequency samples were linearly compressed to 54 coefficients spanning the frequency range from 62.5 to 7000 Hz. Most feature extraction algorithms used this 54‐coefficient representation. A knowledge‐engineering approach was used to create feature extraction algorithms from digital spectrograms of 2080 tokens of English letters produced by ten male and ten female talkers. Featural measurements included voice onset time, onset abruptness, formant frequencies, and formant trajectories, and all made use of the end‐points of the sound and/or vowel onset as temporal anchors. Feature values from a subset of the 2080 utterances were used to train the classifier, and system perfor...