Beat gesture prediction using prosodic features

In this work we present a machine learning approach to gesture prediction using prosodic features. We use conditional random fields to predict the presence of beat gestures using the following prosodic features: pitch, pitch-derivatives, intensity and absence or presence of syllable nuclei. These features are calculated over overlapping sliding windows big enough to average out the high frequency variations associated with pitch and intensity at the syllable level. We found that the results improve remarkably when the classification is treated as a multi-class problem as opposed to a binary problem with the two classes: presence and absence of gesture.