A computational model of prosody perception

This paper describes a computational model of auditory rhythm perception, and demonstrates its application to the extraction of prosodic information from spoken language. The model consists of three stages. In the first stage, the speech waveform is processed by a simulation of the auditory periphery. Secondly, the output of the auditory periphery is processed by a multiscale filtering mechanism, analogous to a short-term auditory memory. Finally, peaks in the response of the multiscale mechanism are accumulated in a long-term auditory store, and plotted to give a representation referred to as a rhythmogram. It is demonstrated that there is a close relationship between the rhythmogram of an utterance and its corresponding stress hierarchy derived by phonological analysis.