Between-word distance calculation in a symbolic domain and its applications to speech recognition

Abstract This paper discusses a new framework for speech recognition processing based on the distance calculation in a symbolic domain. It presents a more efficient processing alternative to the conventional statistical method that is based on large amount of speech samples. We first propose a method for calculating the distance between words (i.e., between-word distance (BWD)) in symbolic domain. We then present two applications in speech recognition. Distance calculation employs an optimal matching between subphonemic segment sequences using dynamic programming (DP) to take context-dependent characteristics into account. One application of the distance calculation is to use distributions of the BWDs to estimate the relative degree of difficulty in speech recognition of given word sets. We provide effective indices for this degree of difficulty using statistical parameters of these distance distributions. The other application is to predict plausible word candidates in relation to unknown word processing. We examine preliminary experiments in word recognition based on the distance between the subphonemic sequences, calculated in symbolic domain. The proposed recognition procedure achieves performance closely comparable to the ordinal hidden Markov model (HMM)-based speech recognition procedure. The result indicates a feasible prospect for efficient processing of unknown words or for multi-category speech recognition, although some deterioration of recognition score itself is to be expected.

[1]  Karen Spärck Jones,et al.  Unconstrained keyword spotting using phone lattices with application to spoken document retrieval , 1997, Comput. Speech Lang..

[2]  Frederick Jelinek,et al.  Statistical methods for speech recognition , 1997 .

[3]  Lalit R. Bahl,et al.  Decoding for channels with insertions, deletions, and substitutions with applications to speech recognition , 1975, IEEE Trans. Inf. Theory.

[4]  Kazuyo Tanaka,et al.  A method of extracting time-varying acoustic features effective for speech recognition , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5]  John Makhoul,et al.  Automatic Detection Of New Words In A Large Vocabulary Continuous Speech Recognition System , 1989, HLT.

[6]  Kazuyo Tanaka,et al.  A large vocabulary word recognition system using rule-based network representation of acoustic characteristic variations , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[7]  Lalit R. Bahl,et al.  Automatic recognition of continuously spoken sentences from a finite state grammer , 1978, ICASSP.

[8]  Takashi Otsuki,et al.  Performance prediction of word recognition using the transition information between phonemes or between characters , 1994, Systems and Computers in Japan.

[9]  Steve Young,et al.  The HTK book , 1995 .

[10]  Mari Ostendorf,et al.  From HMM's to segment models: a unified view of stochastic modeling for speech recognition , 1996, IEEE Trans. Speech Audio Process..

[11]  Stephen E. Levinson,et al.  Computing relative redundancy to measure grammatical constraint in speech recognition tasks , 1978, ICASSP.

[12]  Roberto Gemello,et al.  Hybrid HMM-NN modeling of stationary-transitional units for continuous speech recognition , 2000, Inf. Sci..

[13]  Kazuyo Tanaka,et al.  A demiphoneme network representation of speech and automatic labeling techniques for speech data base construction , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[14]  Kazuyo Tanaka,et al.  Detection of unknown words in large vocabulary speech recognition , 1993, EUROSPEECH.

[15]  T. Watanabe,et al.  Unknown utterance rejection using likelihood normalization based on syllable recognition , 1993 .