Quality prediction for synthesized speech: Comparison of approaches

Text-To-Speech (TTS) technology has reached a level of maturity which seems to be sufficient for a number of telephony applications. In order to assess TTS quality, system developers need to carry out auditory tests where participants are asked to transcribe what they have heard or to rate certain aspects of the auditory event, see e.g. [1]. To overcome the temporal and financial effort involved in auditory testing, it is desirable to estimate the quality on the basis of speech signals, i.e. on the basis of instrumental measurements.