Speech recognition using zero-crossing measurements and sequence information

A technique for discrimination of speech sounds has been proposed, based on the measurements of rate at which zero crossings fall into previously defined channels. The rate-measuring circuits produce an analogue voltage output which represents a measure of a sound being present as a function of time. These circuits are then combined for maximum detection and integration before the sound is classified. The equipment can discriminate up to 16 sound classes, which correspond roughly to phonemes or groups of phonemes. When a word is spoken a sequence of characters is generated, each character representing a sound class. This sequence is then processed by a set of algorithms, in order to arrive at a decision as to which word has been spoken. No prior segmentation of the speech signal is necessary since that is inherent in the proposed system. The technique incorporates an adaptation process of (a) movement of boundaries or zero-crossing channels (b) adjustment of reference and threshold levels (c) control of timing, and (d) adjustment of an amount of a.c. bias mixed with the signal prior to zero crossing. Tests conducted with 30 references and 12 unknown talkers (all males), using a vocabulary of 15 words (including digits), gave recognition scores in the range 91%?98%, depending on the conditions of the tests.