Integrating syllable boundary information into speech recognition

We examine the proposition that knowledge of the timing of syllabic onsets may be useful in improving the performance of speech recognition systems. A method of estimating the location of syllable onsets derived from the analysis of energy trajectories in critical band channels has been developed, and a syllable-based decoder has been designed and implemented that incorporates this onset information into the speech recognition process. For a small, continuous speech recognition task the addition of artificial syllabic onset information (derived from advance knowledge of the word transcriptions) lowers the word error rate by 38%. Incorporating acoustically-derived syllabic onset information reduces the word error rate by 10% on the same task. The latter experiment has highlighted representational issues on coordinating acoustic and lexical syllabifications, a topic we are beginning to explore.

[1]  Frank K. Soong,et al.  A Tree.Trellis Based Fast Search for Finding the N Best Sentence Hypotheses in Continuous Speech Recognition , 1990, HLT.

[2]  C. W. Carter,et al.  The words and sounds of telephone conversations , 1930 .

[3]  Frank K. Soong,et al.  A Tree.Trellis Based Fast Search for Finding the N Best Sentence Hypotheses in Continuous Speech Recognition , 1990, HLT.

[4]  D. D. Greenwood Critical Bandwidth and the Frequency Coordinates of the Basilar Membrane , 1961 .

[5]  Hynek Hermansky,et al.  RASTA processing of speech , 1994, IEEE Trans. Speech Audio Process..

[6]  Katrin Kirchhoff Syllable-level desynchronisation of phonetic features for speech recognition , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[7]  Steve Renals,et al.  Efficient evaluation of the LVCSR search space using the NOWAY decoder , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[8]  Emmanuel Dupoux,et al.  The role of the syllable in speech segmentation , 1991 .

[9]  Antonio Bonafonte,et al.  Study of subword units for Spanish speech recognition , 1995, EUROSPEECH.

[10]  D. Massaro Perceptual units in speech recognition. , 1974, Journal of experimental psychology.

[11]  P. Mermelstein Automatic segmentation of speech into syllabic units. , 1975, The Journal of the Acoustical Society of America.

[12]  Hermann Ney,et al.  Word graphs: an efficient interface between continuous-speech recognition and language understanding , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[13]  Phil D. Green,et al.  Speech representations in the SYLK recognition project , 1993 .

[14]  Kenneth Ward Church,et al.  Phonological parsing and lexical retrieval , 1987, Cognition.

[15]  D. O'Shaughnessy,et al.  A*-admissible heuristics for rapid lexical access , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[16]  John J. Godfrey,et al.  SWITCHBOARD: telephone speech corpus for research and development , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[17]  Ronald A. Cole,et al.  Speech recognition using syllable-like units , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[18]  Keh-Jiann Chen,et al.  A syllable-based very-large-vocabulary voice retrieval system for Chinese databases with textual attributes , 1995, EUROSPEECH.

[19]  O. Fujimura,et al.  Syllable as a unit of speech recognition , 1975 .

[20]  Paul Mermelstein,et al.  Experiments in syllable-based recognition of continuous speech , 1980, ICASSP.

[21]  Günther Ruske,et al.  Syllable segmentation of continuous speech with artificial neural networks , 1993, EUROSPEECH.

[22]  F. Jelinek Fast sequential decoding algorithm using a stack , 1969 .

[23]  A JELLINEK Understanding of speech. , 1951, The Nervous child.

[24]  Steven Greenberg,et al.  UNDERSTANDING SPEECH UNDERSTANDING: TOWARDS A UNIFIED THEORY OF SPEECH PERCEPTION , 1996 .