Speech interfaces from an evolutionary perspective

36 September 2000/Vol. 43, No. 9 COMMUNICATIONS OF THE ACM Similarly, the human head and brain are uniquely evolved to produce speech [4, 11]. Compared to other primates, humans have remarkably well-developed and controllable muscles around the lips and cheeks. Indeed, more of the motor cortex (particularly Broca’s area) is devoted to vocalization than to any other function, in sharp contrast to every other animal, including primates. Only homo sapiens can use the tongue, cheeks, and lips, together with the teeth, to produce 14 phonemes per second [11]; even the Neanderthals, who are known for having large brains and elaborate cultural and social behaviors, could not sustain speech due to the structure of their breathing apparatus [4]. Modern humans are also exquisitely tuned for speech recognition. Infants as young as one day old show relatively greater left hemisphere electrical activity to speech sounds and relatively greater right hemisphere activity to non-speech sounds [11]; by 22 days old, infants exhibit the adult tendency for right-ear (and left-brain-hemisphere) dominance for spoken sounds (regardless of language) and left-ear (and right-brain-hemisphere) dominance for music and other sounds [11]. Why is the fundamental and uniquely human propensity for speech so important in the design of How does the human brain react when confronted by a talking computer? Answers from psychological research and its design implications help define the limits of what computers should say and how they might say it. Clifford Nass and Li Gong