Characterizing and Processing Robot-Directed Speech

Speech directed at infants and pets has properties that distinguish it from speech among adults (6). Some of those properties are potentially useful for language learning. By careful design of form and behavior, robots can hope to evoke a similar speech register and take advantage of these properties. We report some preliminary data to support this claim, based on experiments carried out with the infant-like robot Kismet (4). We then show how we can build a language model around an initial vocabulary, perhaps acquired from "cooperati ve" speech, and bootstrap from it to identify further candidate vocabulary items drawn from arbitrary speech in an unsupervised manner. We show how to cast this process in a form that can be largely implemented using a conventional speech recognition system (8), even though such systems are designed with very different applications in mind. This is advantageous since, after decades of research, such systems are expert at making acoustic judgments in a probabilistically sound way from acoustic, phonological, and language models.

[1]  Cynthia Breazeal,et al.  Recognition of Affective Communicative Intent in Robot-Directed Speech , 2002, Auton. Robots.

[2]  Denis Burnham,et al.  Are you my little pussy-cat? acoustic, phonetic and affective qualities of infant- and pet-directed speech , 1998, ICSLP.

[3]  I. Pepperberg Referential mapping: A technique for attaching functional significance to the innovative utterances of an African Grey parrot (Psittacus erithacus) , 1990, Applied Psycholinguistics.

[4]  P. Jusczyk,et al.  Infants′ Detection of the Sound Patterns of Words in Fluent Speech , 1995, Cognitive Psychology.

[5]  Herbert Gish,et al.  Phonetic-based word spotter: various configurations and application to event spotting , 1993, EUROSPEECH.

[6]  James Glass,et al.  Modelling out-of-vocabulary words for robust speech recognition , 2002 .

[7]  James L. Morgan,et al.  Signal to syntax : bootstrapping from speech to grammar in early acquisition , 1996 .

[8]  Julia Hirschberg,et al.  Prosodic cues to recognition errors , 1999 .

[9]  C. Trevarthen Communication and cooperation in early infancy: a description of primary intersubjectivity , 1979 .

[10]  Timothy J. Hazen,et al.  A comparison and combination of methods for OOV word detection and word confidence scoring , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[11]  James R. Glass,et al.  Speechbuilder: facilitating spoken dialogue system development , 2001, INTERSPEECH.

[12]  Paul R. Cohen,et al.  Toward natural language interfaces for robotic agents: grounding linguistic meaning in sensors , 2000, AGENTS '00.

[13]  James R. Glass,et al.  A probabilistic framework for feature-based speech recognition , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[14]  James R. Glass,et al.  Modeling out-of-vocabulary words for robust speech recognition , 2000, INTERSPEECH.

[15]  Victor Zue,et al.  Conversational interfaces: advances and challenges , 1997, Proceedings of the IEEE.

[16]  E. Bard,et al.  The unintelligibility of speech to children: effects of referent availability , 1994, Journal of Child Language.

[17]  Alex Pentland,et al.  Learning words from sights and sounds: a computational model , 2002, Cogn. Sci..

[18]  P. Jusczyk The discovery of spoken language , 1997 .

[19]  C. Breazeal Sociable Machines: Expressive Social Ex-change Between Humans and Robots , 2000 .

[20]  M. Brent,et al.  The role of exposure to isolated words in early vocabulary development , 2001, Cognition.

[21]  Victor Zue,et al.  JUPlTER: a telephone-based conversational interface for weather information , 2000, IEEE Trans. Speech Audio Process..