A segmental k-means training procedure for connected word recognition

Algorithms for recognizing strings of connected words from whole-word patterns have become highly efficient and accurate, although computation rates remain high. Even the most ambitious connected-word recognition task is practical with today's integrated circuit technology, but extracting reliable, robust whole-word reference patterns still is difficult. In the past, connected-word recognizers relied on isolated-word reference patterns or patterns derived from a limited context (e.g., the middle digit from strings of three digits). These whole-word patterns were adequate for slow rates of articulated speech, but not for strings of words spoken at high rates (e.g., about 200 to 300 words per minute). To alleviate this difficulty, a segmental k-means training procedure was used to extract whole-word patterns from naturally spoken word strings. The segmented words are then used to create a set of word reference patterns for recognition. Recognition string accuracies were 98 to 99 percent for digits in variable length strings and 90 to 98 percent for sentences from an airline reservation task. These performance scores represent significant improvements over previous connected-word recognizers.

[1]  F. Jelinek,et al.  Continuous speech recognition by statistical methods , 1976, Proceedings of the IEEE.

[2]  Frederick Jelinek,et al.  Speech Recognition by Statistical Methods , 1976 .

[3]  Stephen E. Levinson,et al.  Computing relative redundancy to measure grammatical constraint in speech recognition tasks , 1978, ICASSP.

[4]  H. Sakoe,et al.  Two-level DP-matching--A dynamic programming-based pattern matching algorithm for connected word recognition , 1979 .

[5]  Aaron E. Rosenberg,et al.  Speaker independent recognition of isolated words using clustering techniques , 1979, ICASSP.

[6]  H. Sakoe,et al.  Two-level DP-matching algorithm-a dynamic programming based pattern matching algorithm for continuous speech recognition , 1979 .

[7]  Lawrence R. Rabiner,et al.  Application of dynamic time warping to connected digit recognition , 1980 .

[8]  L. Rabiner,et al.  Speaker‐independent isolated word recognition using a 129‐word airline vocabulary , 1981 .

[9]  Lawrence R. Rabiner,et al.  Connected digit recognition using a level-building DTW algorithm , 1981 .

[10]  C. Myers,et al.  A level building dynamic time warping algorithm for connected word recognition , 1981 .

[11]  Lawrence R. Rabiner,et al.  An embedded word training procedure for connected digit recognition , 1982, ICASSP.

[12]  Michael D. Brown,et al.  An algorithm for connected word recognition , 1982, ICASSP.

[13]  L. R. Rabiner,et al.  An introduction to the application of the theory of probabilistic functions of a Markov process to automatic speech recognition , 1983, The Bell System Technical Journal.

[14]  Hermann Ney,et al.  The use of a one-stage dynamic programming algorithm for connected word recognition , 1984 .

[15]  Lawrence R. Rabiner,et al.  On the application of embedded digit training to speaker independent connected digit recognition , 1984 .

[16]  J. G. Wilpon,et al.  On the application of embedded training to connected letter recognition for directory listing retrieval , 1984, AT&T Bell Laboratories Technical Journal.

[17]  Lawrence R. Rabiner,et al.  A modified K-means clustering algorithm for use in isolated work recognition , 1985, IEEE Trans. Acoust. Speech Signal Process..

[18]  Stephen E. Levinson,et al.  A speaker-independent, syntax-directed, connected word recognition system based on hidden Markov models and level building , 1985, IEEE Trans. Acoust. Speech Signal Process..

[19]  L. R. Rabiner,et al.  Recognition of isolated digits using hidden Markov models with continuous mixture densities , 1985, AT&T Technical Journal.