Estimating speaking rate in spontaneous speech from z-scores of pattern durations

We propose a novel method for estimating speech rate based on the durations of similar patterns as a first step in determining the relation between various speaking styles used in everyday conversation and speaker intentions or attitudes. Whereas most methods of determining speaking rate require manually obtained label information or linguistic knowledge, the proposed method uses patterns of speech-sound sequences that occur relatively frequently in dialogue speech, as detected from the speech waveform information alone. For use as an index of speaking rate, the method calculates the z-score of each pattern duration, relative to the distribution of the respective pattern groups. The method uses speech recognition to provide a rough classification of the speech sounds, i.e., as a phonetic typewriter, but without requiring accuracy of recognition in any meaningful linguistic terms. From a large body of natural dialogue speech data, it divides the label sequences obtained from the recognizer/classifier into variable length patterns according to maximum likelihood, and classifies all speech segments having the same pattern as a group. The validity of speech rate detection was evaluated.