Introduction to Part IV

This section consists of five papers on how to use prosodic information (prosodic features of speech), such as pitch, energy, and duration cues, in automatic speech recognition. As earlier chapters have shown, prosodic information plays an important role in human speech communication. In the last few years, speech recognition systems have dramatically improved, and automatic speech understanding is now a realistic goal. With these developments, the potential role of recognizing prosodic features has become greater, since a transcription of the spoken word sequence alone may not provide enough information for accurate speech understanding; the same word sequence can have different meanings associated with different prosody. Meaning is affected by phrase boundaries, pitch accents, and tone (intonation). For example, phrase boundary placement (detection) is useful in syntactic disambiguation, and tone is useful in determining whether or not an utterance is a yes—no question. In English, there are many noun—verb or noun—adjective pairs in which a change in the word accent indicates a change in the word meaning. Phrase boundary placement is also useful for reducing the search space, that is, reducing the number of calculations in continuous speech recognition.