论文信息 - Rules, but what for? - rule description as efficient and robust abstraction of corpora and optimal fitting to applications -

Rules, but what for? - rule description as efficient and robust abstraction of corpora and optimal fitting to applications -

Two recent studies are introduced in speech recognition and speech synthesis to reconsider what rules should be looked for spoken language science and technology. To abstract the neighboring characteristics expressed by Ngrams, multi-class composite N-grams have been proposed to model POS characteristics and inflectional forms separately. It is shown that statistical clustering can provide more compact and robust description of word neighboring characteristics than conventional N-grams. For speech synthesis, segmental duration modeling has been examined from the viewpoint of perceptual characteristics of duration changes. A series of perceptual experiments have shown the context dependency of sensitivity to duration change. These two examples respectively illustrate how current rules are interpreted to build scientifically acceptable engineering models and remind us of the deeper scientific meaning and limitation of generalization as a rule.

Yoshinori Sagisaka | Hirofumi Yamamoto | Minoru Tsuzaki | Hiroaki Kato

[1] Yoshinori Sagisaka,et al. Variable-order N-gram generation by word-class splitting and consecutive word grouping , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[2] A. Cohen,et al. Structure and Process in Speech Perception , 1975 .

[3] Yoshinori Sagisaka,et al. Spontaneous dialogue speech recognition using cross-word context constrained word graphs , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[4] Robert L. Mercer,et al. Class-Based n-gram Models of Natural Language , 1992, CL.

[5] Yoshinori Sagisaka,et al. Multi-class composite N-gram based on connection direction , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[6] Y. Sagisaka,et al. Acceptability for temporal modification of consecutive segments in isolated words. , 1997, The Journal of the Acoustical Society of America.

[7] Rolf Carlson,et al. Perception of Segmental Duration , 1975 .

[8] A. Huggins,et al. On the perception of temporal phenomena in speech. , 1972, The Journal of the Acoustical Society of America.

[9] Y. Sagisaka,et al. Acceptability for temporal modification of single vowel segments in isolated words. , 1998, The Journal of the Acoustical Society of America.

[10] Yoshinori Sagisaka,et al. Effects of phonetic quality and duration on perceptual acceptability of temporal changes in speech , 1998, ICSLP.

[11] Minoru Tsuzaki,et al. Intensity effect on discrimination of auditory duration flanked by preceding and succeedine tones , 1994 .