A comparative study of HMM-based approaches for the automatic recognition of perceptually relevant aspects of spontaneous German speech melody

Three approaches to the speaker independent automatic recognition of melodic aspects of spontaneous German are presented. All systems are based on Hidden Markov Models. Their input is restricted to the speech signal from which a feature extraction component derives eleven prosodic features. No additional information — as commonly used for prosody recognition — like word chains, word hypotheses, further segmental or lexical prosodic information (e.g. stress placement) is required. The three systems are tested and compared with respect to their performance on a speaker-independent recognition task on spontaneous German speech focusing on three functional aspects of speech melody (accent lending, boundary signalling, concatenating pitch movements) and the pause as a fourth category.