HarkMan—A vocabulary-independent keyword spotter for spontaneous Chinese speech

In this paper, a novel technique adopted in HarkMan is introduced. HarkMan is a keyword-spotter designed to automatically spot the given words of a vocabulary-independent task in unconstrained Chinese telephone speech. The speaking manner and the number of keywords are not limited. This paper focuses on the novel technique which addresses acoustic modeling, keyword spotting network, search strategies, robustness, and rejection. The underlying technologies used in HarkMan given in this paper are useful not only for keyword spotting but also for continuous speech recognition. The system has achieved a figure-of-merit value over 90%.

[1]  Sadaoki Furui,et al.  Speaker-independent isolated word recognition using dynamic features of speech spectrum , 1986, IEEE Trans. Acoust. Speech Signal Process..

[2]  W. Russell,et al.  Continuous hidden Markov modeling for speaker-independent word spotting , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[3]  R. Wohlford,et al.  Keyword recognition using template concatenation , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  H. Gish,et al.  Probabilistic vector mapping of noisy speech parameters for HMM word spotting , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[5]  Mingxing Xu,et al.  Rejection in Speech Recognition Based on CDCPMs , 1997, ROCLING/IJCLCLP.

[6]  A. Nadas,et al.  Speech recognition using noise-adaptive prototypes , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[7]  Richard Rose,et al.  A hidden Markov model based keyword recognition system , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[8]  Yoichi Takebayashi,et al.  Keyword-spotting in noisy continuous speech using word pattern vector subabstraction and noise immunity learning , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9]  Mitch Weintraub,et al.  Spectral Estimation for Noise Robust Speech Recognition , 1989, HLT.

[10]  Fang Zheng,et al.  A Real-World Speech Recognition System Based on CDCPMs , 2000 .

[11]  Stephen Cox,et al.  Unsupervised speaker adaptation by probabilistic spectrum fitting , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[12]  Chin-Hui Lee,et al.  A frame-synchronous network search algorithm for connected word recognition , 1989, IEEE Trans. Acoust. Speech Signal Process..

[13]  Kenney Ng,et al.  Robust mapping of noisy speech parameters for HMM word spotting , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[14]  Michael Picheny,et al.  Speech recognition using noise-adaptive prototypes , 1989, IEEE Trans. Acoust. Speech Signal Process..

[15]  Biing-Hwang Juang,et al.  Signal restoration by spectral mapping , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[16]  Yoichi Takebayashi,et al.  A robust speech recognition system using word-spotting with noise immunity learning , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[17]  Fang Zheng,et al.  Center-distance continuous probability models and the distance measure , 2008, Journal of Computer Science and Technology.

[18]  Mingxing Xu,et al.  The Description of the Intra-State Feature Space in Speech Recognition , 1997, ROCLING/IJCLCLP.