Motor control of the tongue during speech: Predictions of an optimization policy under sensorimotor noise

Speech production necessitates a high dexterity. Some sounds like /i/, /s/ or /l/ require every precise positioning of the tongue, while tasks constraints can vary widely (inertial perturbations in walking/running, vocal tract perturbations during eating...). Moreover, tongue movements are very rapid, on the order of some tenth of millisecond, while proprioceptive and auditory feedback latencies are longer. Last, sensory and motor signals have limited precision (they are 'noisy'). For all these reasons, it has been proposed that the central nervous system (CNS) rely on an optimal estimation system ('internal model') in order to adjust the trajectories in real time. Moreover, the CNS also seems to optimize some perceived cost, as suggested by the motor stereotypies observed in eye or arm movements. In particular, the hypothesis of the minimization of an internal measure of effort, or the minimization of the impact of motor noise on endpoint variance account for a large body of experimental observations. Here we test whether an effort optimization controller coupled to an optimal state estimator can account for the trajectories of the tongue when a subject produces the three vowels /i/, /a/ and /u/ from a neutral (schwa) initial posture. In a first step, we ran 20,000 simulations of a finite element model of the tongue in order to describe the effect of combinations of muscle activations ramps across six muscles of the tongue (3 intrinsic, 3 extrinsic). We then applied model identification techniques to obtain a computationally tractable dynamical model of the tongue in the sagittal plane. Assuming a fixed jaw position and a standard geometry for the rest of the vocal tract, we obtained a simplified model of the speech production system. We could then derive the first three formants of the voice from the instantaneous tongue position through a harmonic analysis. With the sensorimotor plant thus defined, we applied standard numerical techniques inside a time loop to simulate the function of an optimal estimator/controller subjected to sensory and motor uncertainty, and generated tongue trajectories from the initial posture to the final endpoints defined either in postural space or in acoustic (F1-F2) space. These simulations allow exploring how optimal control hypotheses can explain the average trajectories and the impact of sensorimotor noise on their variability.