Recent work at Bell Laboratories has demonstrated the utility of applying sophisticated pattern recognition techniques to obtain a set of speaker-independent word templates for an isolated word recognition system [Levinson et al.,IEEE Trans. Acoust. Speech Signal Process. ASSP-27 (2), 134--141 (1979); Rabiner et al., IEEE Trans. Acoust. Speech Signal Process.(in press)]. In these studies, it was shown that a careful experimenter could guide the clustering algorithms to choose a small set of templates that were representative of a large number of replications for each word in the vocabulary. Subsequent word recognition tests verified that the templates chosen were indeed representative of a fairly large population of talkers. Given the success of this approach, the next important step is to investigate fully automatic techniques for clustering multiple versions of a single word into a set of speaker-independent word templates. Two such techniques are described in this paper. The first method uses distance data (between replications of a word) to segment the population into stable clusters. The word template is obtained as either the cluster minimax, or as an averaged version of all the elements in the cluster. The second method is a variation of the one described by Rabiner [IEEE Trans. Acoust. Speech Signal Process. ASSP-26 (3), 34--42 (1978)] in which averaging techniques are directly combined with the nearest neighbor rule to simultaneously define both the word template (i.e., the cluster center) and the elements in the cluster. Experimental data show the first method to be superior to the second method when three or more clusters per word are used in the recognition task.
[1]
S. Levinson,et al.
Considerations in dynamic time warping algorithms for discrete word recognition
,
1978
.
[2]
J. Gowdy,et al.
A speaker-independent speech-recognition system based on linear prediction
,
1978
.
[3]
Lawrence R. Rabiner,et al.
On creating reference templates for speaker independent recognition of isolated words
,
1978
.
[4]
José M. Tribolet,et al.
Statistical properties of an LPC distance measure
,
1979,
ICASSP.
[5]
C. E. Schmidt,et al.
Recognition of spoken spelled names applied to directory assistance
,
1977
.
[6]
Aaron E. Rosenberg,et al.
Interactive clustering techniques for selecting speaker-independent reference templates for isolated word recognition
,
1979
.
[7]
Aaron E. Rosenberg,et al.
New techniques for automatic speaker verification
,
1975
.
[8]
Aaron E. Rosenberg,et al.
Speaker independent recognition of isolated words using clustering techniques
,
1979,
ICASSP.
[9]
F. Itakura,et al.
Minimum prediction residual principle applied to speech recognition
,
1975
.
[10]
A. E. Rosenberg,et al.
Evaluation of an automatic word recognition system over dialed‐up telephone lines
,
1976
.