Some experiments with a simple word recognition system

This paper describes some pilot experiments in a project to recognize selected words spoken by any talker. For this initial work, a population of 10 talkers is used and the 32 test words are spoken in isolation. In the experiments, a word is represented by a set of regularly spaced time samples of the normalized spectrum envelope, and recognition is achieved by comparison of such a set of samples with a library of stored sets. Limitations on computer storage necessitate a compact method of coding a spectrum sample. Two methods have been compared, one of which classifies spectra into a very small number of types, while the other uses a 24-bit representation. The most serious problem encountered in spectrum matching is the well-known lack of synchronism between corresponding spectral events when phonemically identical words are spoken by different talkers. This paper describes an attack on this problem, which relies on using a number of sets of spectrum samples to represent each word in the stored library of words to be recognized. There are a number of ways of using the scores obtained from matching an unknown word with a library of known words. Some of these are described and some results are given both for the case where the unknown word is known to be in the stored library, and for the case where there is no such limitation.