Korean MULTEXT: A Korean Prosody Corpus

This paper describes the contents of the Korean prosody corpus (Korean MULTEXT), which is a Korean version of the speech database Eurom1. The corpus consists of about 2 hours of read speech, transcribed primarily in orthography (in Korean alphabet and in a Romanized transcription), in IPA and in SAMPA. Furthermore, it includes the original F0 values, stylized F0 values extracted using Momel, and hand-corrected F0 values. The prosodic events are annotated in two ways. They are annotated with the automatic annotation algorithm, INTSINT, and also labeled manually into prosodic units with two tones on the hand-corrected pitch targets. It is found that the resulting tone patterns from the proposed Momel-based two tone labeling correspond to those defined in K-ToBI.