Improving the phonetic annotation by means of prosodic phrasing

It was established that the performance of our annotation system [8] is affected by the length of the utterances: the error rate, the CPU-load and the memory requirements tend to increase as the utterances get longer. In this contribution the speech signal is first segmented into speech, pauzes and noise (breaths, clicks, : : : ) and subsequently split in signal phrases prior to the annotation. Experiments on 3 different databases (3 languages) demonstrate that this stategy yields a significant improvement of the annotation accuracy.

[1]  J P Martens,et al.  Pitch and voiced/unvoiced determination with an auditory model. , 1992, The Journal of the Acoustical Society of America.

[2]  Mari Ostendorf,et al.  Automatic labeling of prosodic patterns , 1994, IEEE Trans. Speech Audio Process..

[3]  Nick Campbell,et al.  Automatic detection of prosodic boundaries in speech , 1993, Speech Commun..

[4]  Dirk Van Compernolle,et al.  Reduced semi-continuous models for large vocabulary continuous speech recognition in Dutch , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[5]  Nick Campbell Prosodic influence on segmental quality , 1995, EUROSPEECH.

[6]  P. Mermelstein Automatic segmentation of speech into syllabic units. , 1975, The Journal of the Acoustical Society of America.

[7]  Jean-Pierre Martens,et al.  Automatic segmentation and labelling of multi-lingual speech data , 1996, Speech Commun..

[8]  Jean-Pierre Martens,et al.  Fast automatic segmentation and labeling: results on TIMIT and EUROMO , 1995, EUROSPEECH.

[9]  Jean-Pierre Martens,et al.  Broad phonetic classification and segmentation of continuous speech by means of neural networks and dynamic programming , 1991, Speech Commun..

[10]  Mari Ostendorf,et al.  Automatic recognition of prosodic phrases , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.