FACTS: A Hierarchical Task-based Control Model of Speech Incorporating Sensory Feedback

We present a computational model of speech motor control that integrates vocal tract state prediction with sensory feedback. This hierarchical model, called FACTS, incorporates both a high-level and low-level controller. The high-level controller orchestrates linguistically-relevant speech tasks, which are represented as desired constrictions along the vocal tract (e.g., closure of the lips). The output of the high-level controller is passed to a low-level controller that can issue motor commands at the level of the speech articulators in order to accomplish the desired constrictions. In order to generate these articulatory motor commands, the low-level articulatory controller relies on an estimate of the current state of the vocal tract. This estimate combines internal predictions about the consequences of issued motor commands with auditory and somatosensory feedback from the vocal tract using an Unscented Kalman Filter based state estimation method. FACTS is able to reproduce several important aspects of human speech behavior such as: (i) stable speech behavior in the presence of noisy motor and sensory systems, (ii) partial acoustic compensation to auditory feedback perturbations, (iii) complete compensations to mechanical perturbations only when they interfere with current production goals, and (iv) the observed relationship between sensory acuity and response to sensory perturbations.

[1]  Kevin G Munhall,et al.  Adaptive control of vowel formant frequency: evidence from real-time formant manipulation. , 2006, The Journal of the Acoustical Society of America.

[2]  Louis Goldstein,et al.  A task-dynamic toolkit for modeling the effects of prosodic structure on articulation , 2008, Speech Prosody 2008.

[3]  L Saltzman Elliot,et al.  A Dynamical Approach to Gestural Patterning in Speech Production , 1989 .

[4]  Rudolph van der Merwe,et al.  The unscented Kalman filter for nonlinear estimation , 2000, Proceedings of the IEEE 2000 Adaptive Systems for Signal Processing, Communications, and Control Symposium (Cat. No.00EX373).

[5]  Vikram Ramanarayanan,et al.  A New Model of Speech Motor Control Based on Task Dynamics and State Feedback , 2016, INTERSPEECH.

[6]  J. Kelso,et al.  Functionally specific articulatory cooperation following jaw perturbations during speech: evidence for coordinative structures. , 1984, Journal of experimental psychology. Human perception and performance.

[7]  O. Jacobs,et al.  Introduction to Control Theory , 1976, IEEE Transactions on Systems, Man, and Cybernetics.

[8]  P. Rubin,et al.  CASY: The Haskins Configurable Articulatory Synthesizer , 2003 .

[9]  Louis Goldstein,et al.  Dynamics and articulatory phonology , 1996 .

[10]  R. Cowie,et al.  Postlingually Acquired Deafness: Speech Deterioration and the Wider Consequences , 1992 .

[11]  J. Perkell,et al.  Sensorimotor adaptation to feedback perturbations of vowel acoustics and its relation to perception. , 2007, The Journal of the Acoustical Society of America.

[12]  Michael I. Jordan,et al.  Optimal feedback control as a theory of motor coordination , 2002, Nature Neuroscience.

[13]  A. J. Yates Delayed Auditory Feedback , 1958, Psychological bulletin.

[14]  Mark Hasegawa-Johnson,et al.  A procedure for estimating gestural scores from speech acoustics. , 2012, The Journal of the Acoustical Society of America.

[15]  A. Faisal,et al.  Noise in the nervous system , 2008, Nature Reviews Neuroscience.

[16]  Jeffery A. Jones,et al.  Learning to produce speech with an altered vocal tract: the role of auditory feedback. , 2003, The Journal of the Acoustical Society of America.

[17]  David J. Ostry,et al.  Somatosensory function in speech perception , 2009, Proceedings of the National Academy of Sciences.

[18]  Bernard S. Lee Effects of delayed speech feedback , 1950 .

[19]  S. Nagarajan,et al.  What Does Motor Efference Copy Represent? Evidence from Speech Production , 2013, The Journal of Neuroscience.

[20]  Sethu Vijayakumar,et al.  Adaptive Optimal Feedback Control with Learned Internal Dynamics Models , 2010, From Motor Learning to Interaction Learning in Robots.

[21]  Frank H Guenther,et al.  The DIVA model: A neural theory of speech acquisition and production , 2011, Language and cognitive processes.

[22]  Vincent L. Gracco,et al.  Task-specific sensorimotor interactions in speech production , 2002, Experimental Brain Research.

[23]  D. Ostry,et al.  Somatosensory basis of speech production , 2003, Nature.

[24]  Jason A. Tourville,et al.  Neural mechanisms underlying auditory feedback control of speech , 2008, NeuroImage.

[25]  Srikantan S. Nagarajan,et al.  Speech Production as State Feedback Control , 2011, Front. Hum. Neurosci..

[26]  Daniel R. Lametti,et al.  Sensory Preference in Speech Production Revealed by Simultaneous Alteration of Auditory and Somatosensory Feedback , 2012, The Journal of Neuroscience.

[27]  R. Ringel,et al.  Articulation without oral sensory control. , 1971, Journal of speech and hearing research.