Modeling speech production using dynamic gestural structures

In the present computational model of speech production, an utterance is represented as an organization of primitive linguistic units, gestures, into a larger structure, a gestural score. Each distinct gesture is linked to a particular subset of vocal tract variables (e.g., lip aperture and protrusion) and model articulators (e.g., lips and jaw), and is associated with a set of time‐invariant dynamic parameters (e.g., lip aperture target, stiffness, and damping coefficients). The values of the dynamic parameters and their activation intervals are computed as part of the gestural score for a given utterance using a linguistic gestural model that includes a gesture‐based dictionary of English syllables and a flexible rule interpreter for manipulating dynamic parameters and inter‐gestural phasing. The gestural score serves as input to our task‐dynamic model of sensorimotor coordination. In this model, the evolving configuration of the model articulators results from the gesturally and posturally specific way...