On the Objectivity of Prosodic Phrases

Abstract Objective annotation of prosodic phrases in a corpus for a text-to-speech system is an important issue due to its influence on the naturalness of synthesised speech. The paper discusses drawbacks of common ways of prosodic phrase annotation and proposes a con-cept of prosodic phrases defined by a maximum likelihood estimation over results of many parallel subjective annotations. Validity of this method is analysed in terms of agreement among the subjects using Cohen’s and Fleiss’ kappa measures and heuristically modified relative agreement. Keywords Prosodic phrase, objective annotation, speech corpus 1.0 Introduction A text-to-prosody (TTP) system as a subsystem of a text-to-speech (TTS) system can be conceived and developed in terms of a machine learning (ML) paradigm. Such a conception, however, requires the existence of suitable training and testing databases covering desired prosodic phenomena. In this case, what does “suitable” mean? How much of such data do we need? And, most importantly,