A detection approach to search-space reduction for HMM state alignment in speaker verification

To support speaker verification (SV) in portable devices and in telephone servers with millions of users, a fast algorithm for hidden Markov model (HMM) alignment is necessary. Currently, the most popular algorithm is the Viterbi (1967) algorithm with beam search to reduce search-space; however, it is difficult to determine a suitable beam width beforehand. A small beam width may miss the optimal path while a large one may slow down the alignment. To address the problem, we propose a nonheuristic approach to reduce the search-space. Following the definition of the left-to-right HMM, we first detect the possible change-points between HMM states in a forward-and-backward scheme, then use the change-points to enclose a subspace for searching. The Viterbi algorithm or any other search algorithm can then be applied to the subspace to find the optimal state alignment. Compared to a full-search algorithm, the proposed algorithm is about four times faster, while the accuracy is still slightly better in an SV task; compared to the beam search algorithm, the proposed algorithm can provide better accuracy with even lower complexity. In short, for an HMM with S states, the computational complexity can be reduced up to a factor of S/3 with slightly better accuracy than in a full-search approach. This paper also discusses how to extend the change-point detection approach to large-vocabulary continuous speech recognition.

[1]  A. Singer,et al.  Detection and Estimation of , 1999 .

[2]  B. Juang,et al.  VERIFICATION USING VERBAL INFORMATION VERIFICATION FOR AUTOMATIC ENROLLMENT , 1997 .

[3]  Hermann Ney,et al.  Dynamic programming search for continuous speech recognition , 1999, IEEE Signal Process. Mag..

[4]  Chin-Hui Lee,et al.  A frame-synchronous network search algorithm for connected word recognition , 1989, IEEE Trans. Acoust. Speech Signal Process..

[5]  John G. Proakis,et al.  Probability, random variables and stochastic processes , 1985, IEEE Trans. Acoust. Speech Signal Process..

[6]  Rakesh K. Bansal,et al.  An algorithm for detecting a change in a stochastic process , 1986, IEEE Trans. Inf. Theory.

[7]  B. Brodsky,et al.  Nonparametric Methods in Change Point Problems , 1993 .

[8]  Bruce Lowerre,et al.  The Harpy speech understanding system , 1990 .

[9]  Biing-Hwang Juang,et al.  Speaker verification using verbal information verification for automatic enrolment , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[10]  Frank K. Soong,et al.  An N-best candidates-based discriminative training for speech recognition applications , 1994, IEEE Trans. Speech Audio Process..

[11]  E. S. Page A test for a change in a parameter occurring at an unknown point , 1955 .

[12]  Edward Carlstein,et al.  Change-point problems , 1994 .

[13]  N. Deshmukh,et al.  Hierarchical search for large-vocabulary conversational speech recognition: working toward a solution to the decoding problem , 1999 .

[14]  E. S. Page CONTINUOUS INSPECTION SCHEMES , 1954 .

[15]  Hermann Ney,et al.  Improvements in beam search for 10000-word continuous-speech recognition , 1994, IEEE Trans. Speech Audio Process..

[16]  Qi Li,et al.  A fast, sequential decoding algorithm with application to speaker verification , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[17]  Biing-Hwang Juang,et al.  Automatic verbal information verification for user authentication , 2000, IEEE Trans. Speech Audio Process..

[18]  H. Ney,et al.  Improvements in beam search for 10000-word continuous speech recognition , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[19]  Michael Picheny,et al.  Large vocabulary natural language continuous speech recognition , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[20]  G. Lorden PROCEDURES FOR REACTING TO A CHANGE IN DISTRIBUTION , 1971 .

[21]  Aaron E. Rosenberg,et al.  Speaker background models for connected digit password speaker verification , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[22]  Richard M. Schwartz,et al.  Search Algorithms for Software-Only Real-Time Recognition with Very Large Vocabularies , 1993, HLT.

[23]  Qi Li,et al.  A fast decoding algorithm based on sequential detection of the changes in distribution , 1998, ICSLP.

[24]  R. Bellman Dynamic programming. , 1957, Science.

[25]  Van Nostrand,et al.  Error Bounds for Convolutional Codes and an Asymptotically Optimum Decoding Algorithm , 1967 .

[26]  John H. L. Hansen,et al.  Discrete-Time Processing of Speech Signals , 1993 .

[27]  Aaron E. Rosenberg,et al.  General phrase speaker verification using sub-word background models and likelihood-ratio scoring , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.