Disambiguating Recognition Results by Prosodic Features

For the purpose of realizing an effective use of prosodic features in automatic speech recognition, a method was proposed to check the suitability of a recognition candidate through its fundamental frequency contour. In this method, a fundamental frequency contour is generated for each recognition candidate and compared with the observed contour. The generation of fundamental frequency contours is conducted based on prosodic rules formerly developed for text-to-speech conversion, and the comparison is performed only on the portion with recognition ambiguity, by a newly developed scheme denominated partial analysis-by-synthesis. The candidate giving the contour that best matches the observed contour is selected as the final recognition result. The method was shown to be valid for detecting recognition errors accompanied by changes in accent types and/or syntactic boundaries, and was also evaluated as to its performance for detecting phrase boundaries. The results indicated that it can detect boundaries correctly or at least with a location error of one mora.

[1]  Keikichi Hirose,et al.  A System for the Synthesis of High-Quality Speech from Texts on General Weather Conditions (Special Section on Speech Synthesis: Current Technologies and Equipment) , 1993 .

[2]  Mari Ostendorf,et al.  Automatic recognition of prosodic phrases , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[3]  Keikichi Hirose,et al.  Detection of syntactic boundaries by partial analysis-by-synthesis of fundamental frequency contours , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[4]  Keikichi Hirose,et al.  Analysis and synthesis of voice fundamental frequency contours of spoken sentences , 1982, ICASSP.

[5]  Elmar Nöth,et al.  Improving parsing by incorporating 'prosodic clause boundaries into a grammar , 1994, ICSLP.

[6]  Keikichi Hirose,et al.  Use of prosodic features in the recognition of continuous speech , 1994, ICSLP.

[7]  Hiroshi Shimodaira,et al.  Accent phrase segmentation by finding n-best sequences of pitch pattern templates , 1994, ICSLP.

[8]  Edouard Geoffrois A pitch contour analysis guided by prosodic event detection , 1993, EUROSPEECH.

[9]  Keikichi Hirose,et al.  A scheme for pitch extraction of speech using autocorrelation function with frame length proportional to the time lag , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[10]  Katsuhiko Shirai,et al.  Phrase Recognition in Conversational Speech Using Prosodic and Phonemic Information (Special Issue on Speech and Discourse Processing in Dialogue Systems) , 1993 .

[11]  Keikichi Hirose,et al.  Manifestation of Linguistic Information in the Voice Fundamental Frequency Contours of Spoken Japanese (Special Section on Speech Synthesis: Current Technologies and Equipment) , 1993 .

[12]  R. J. Lickley,et al.  Proceedings of the International Conference on Spoken Language Processing. , 1992 .

[13]  Keikichi Hirose,et al.  Analysis of voice fundamental frequency contours for declarative sentences of Japanese , 1984 .