Toward a Speech Neuroprosthesis.

Spoken communication is a basic human function. As such, loss of the ability to speak can be devastating for affected individuals. Stroke or neurodegenerative conditions, such as amyotrophic lateral sclerosis, can result in paralysis or dysfunction of vocal structures that produce speech. Current options are assistive devices that use residual movements, for example, cheek twitches or eye movements, to navigate alphabet displays to type out words.1 While some users depend on these alternative communication approaches, these devices tend to be slow, error-prone, and laborious. A next generation of rehabilitative technologies currently being developed, called brain-computer interfaces (BCIs), directly read out brain signals to replace lost function. The application of neuroprostheses to restore speech has the potential to improve the quality of life of patients with neurological disease, but also including patients who have lost speech from vocal tract injury (eg, from cancer or cancer-related surgery). Many potential approaches exist for reading out brain activity toward restoring communication through a neuroprosthesis. While both noninvasive and intracranial approaches are being explored, an approach using neurophysiological recordings of neuronal activity measured from electrodes either directly on the brain surface or from thin microwire electrode arrays inserted into the cortex has provided encouraging results. Most approaches have adopted the traditional augmentative and alternative communication strategy by restoring communication using the neuroprosthesis to control a computer cursor, usually by decoding neural signals associated with arm movements, to type out letters one by one. However, the best rates for spelling out words are still under 10 words per minute, despite rapid cursor control by some individuals.2 This may represent fundamental limitations in the approach of using a single cursor to spell out words, rather than the ability to accurately read out brain activity. There is a need to substantially improve the accuracy and speed of BCIs to begin to approach natural speaking rates (120-150 words per minute in healthy speakers). The Figure compares the communication rates across various modalities.2,3 Speech is among the most complex motor behaviors and has evolved for efficient communication that is unique to humans. A defining aspect of speech is the rapid transmission of information, ranging from brief, informal conversations to communicating complex ideas, such as in a formal presentation. One reason speech can carry so much information is that the speech signal is generated by the precise and coordinated movements of approximately 100 muscles throughout the vocal tract, giving rise to the repertoire of speech sounds that make up a given language. The key to improving communication BCIs is in the neuroscientific understanding of how the brain controls the vocal tract during speech. For example, the motor map of the human homunculus contains neuronal populations that are involved in executing voluntary control of the larynx, lips, jaw, and tongue. While these representations underlie many functions, such as swallowing and kissing, they are specialized in humans for producing speech features, such as consonants, vowels, and prosodic intonation. In recent years, understanding has significantly deepened from a general idea of the brain location in which functions are located to the more fundamental question of how these patterns are generated by neural substrates. Recent discoveries have been enabled by highresolution (eg, millimeter and millisecond) neurophysiological recordings in humans, for example, in patients with epilepsy who volunteered for research studies involving implanted brain electrodes for localizing a seizure focus. These rare opportunities have yielded discoveries that neural commands produce vocal tract “gestures,” low-dimensional coordinative patterns of movement.4 Gestures produce specific shapes in the vocal tract, for example, the closure of the lips and jaw to make a “p” sound. These gestures are sequenced together to produce fluent sentences. A natural application of these insights is to decode speech from brain activity. A recent report indicated that it is possible to synthesize speech by decoding directly from human cortex while study participants spoke full sentences. Brain signals drove the gestural movements of a computational “virtual vocal tract” to generate audible speech (Video).5 It has also been shown to be possible to translate brain signals into text in real time.6 While these developments are promising, several challenges and opportunities exist in realizing highperformance speech BCIs. Most demonstrations of successful speech decoding have been carried out among study participants with intact speech function. In such contexts, actual speaking was used to train decoding algorithms. A major challenge is how to achieve similar performance in people who are paralyzed and no speech data are available. Imagined speech does not appear to be sufficient for decoding, and the neural code for inner speech or pure thoughts is not clear at this time. Learning to control a speech neuroprosthesis may be possible, but would be akin to relearning how to speak, if not much more difficult. As a result, one potential option is to use a person’s native neural code for speech, which is presumably dormant in paralyzed individuals. Further, closedloop real-time feedback has demonstrated promise in other neuroprosthetic applications and might also have VIEWPOINT