Baby Ears: a recognition system for affective vocalizations

We collected more than 500 utterances from adults talking to their infants. We automatically classified 65% of the strongest utterances correctly as approval, attentional bids, or prohibition. We used several pitch and formant measures, and a multidimensional Gaussian mixture-model discriminator to perform this task. As previous studies have shown, changes in pitch are an important cue for affective messages; we found that timbre or cepstral coefficients are also important. The utterances of female speakers, in this test, were easier to classify than were those of male speakers. We hope this research will allow us to build machines that sense the "emotional state" of a user.

[1]  Rosalind W. Picard Affective Computing , 1997 .

[2]  D. Stern,et al.  Intonation contours as signals in maternal speech to prelinguistic infants. , 1982 .

[3]  Alex Acero,et al.  Maximum a posteriori pitch tracking , 1998, ICSLP.

[4]  R. Tibshirani,et al.  An introduction to the bootstrap , 1993 .

[5]  A. Fernald Four-Month-Old Infants Prefer to Listen to Motherese" , 1985 .

[6]  J. Cohn,et al.  A combination of vocal fo dynamic and summary features discriminates between three pragmatic categories of infant-directed speech. , 1996, Child development.

[7]  E. R. Skinner,et al.  A calibrated recording and analysis of the pitch, force and quality of vocal tones expressing happiness and sadness; and a determination of the pitch and force of the subjective concepts of ordinary, soft and loud tones. , 1935 .

[8]  K. Scherer Vocal affect expression: a review and a model for future research. , 1986, Psychological bulletin.

[9]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[10]  A. Fernald Approval and disapproval: infant responsiveness to vocal affect in familiar and unfamiliar languages. , 1993, Child development.

[11]  A. Fernald,et al.  A cross-language study of prosodic modifications in mothers' and fathers' speech to preverbal infants , 1989, Journal of Child Language.

[12]  Paul Dalsgaard,et al.  Design, recording and verification of a danish emotional speech database , 1997, EUROSPEECH.

[13]  A. Fernald Intonation and Communicative Intent in Mothers' Speech to Infants: Is the Melody the Message?. , 1989 .

[14]  Phil D. Green,et al.  A macroscopic analysis of an emotional speech corpus , 1997, EUROSPEECH.

[15]  L. Cosmides,et al.  The Adapted mind : evolutionary psychology and the generation of culture , 1992 .

[16]  G. Fairbanks,et al.  An experimental study of the pitch characteristics of the voice during the expression of emotion , 1939 .

[17]  H. Papoušek,et al.  The meanings of melodies in motherese in tone and stress languages , 1991 .

[18]  A. Fernald Human maternal vocalizations to infants as biologically relevant signals: An evolutionary perspective. , 1992 .

[19]  Paul Mermelstein,et al.  Experiments in syllable-based recognition of continuous speech , 1980, ICASSP.

[20]  Aaron E. Rosenberg,et al.  An improved endpoint detector for isolated word recognition , 1981 .

[21]  Alex Pentland,et al.  Automatic spoken affect classification and analysis , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[22]  R. Redner,et al.  Mixture densities, maximum likelihood, and the EM algorithm , 1984 .

[23]  Mari Ostendorf,et al.  The use of prosody in syntactic disambiguation , 1991 .