Theoretical aspects of mechanical speech recognition

The human listener is able to take in speech wave-motions and to perform a variety of operations based on them. As a device for transforming speech wave-motions into typescript, the listener is virtually free of errors in a great diversity of conditions. His recognition of speech sounds is based on the use of a language system, that is a system of linguistic units (phonemes, morphemes, words and sentences). In taking in speech, he makes use of his knowledge of the constraints operating in his language, and thus resolves uncertainties and corrects errors arising at the level of acoustic recognition. The redundancy of speech is also exploited at the level of primary or acoustic recognition. A mechanical speech recognizer needs to simulate these two features of the human mechanism if it is to achieve even a small fraction of the flexibility and accuracy of the latter; it must carry out acoustic recognition by inspecting wave-motions in a variety of ways and must then apply statistical knowledge to the results of acoustic recognition.