论文信息 - Prosody recognition in male infant-directed speech

Prosody recognition in male infant-directed speech

Robots designed to learn from and interact with humans require an intuitive method for humans to communicate with them. Normal human speech is very difficult to process, requiring many kinds of complex analysis for robots to interpret it. An intermediate method for communication is recognition of prosody, the affective content of speech. Using prosody recognition, a human interacting with a robot can reward or punish its actions by scolding or praising it. In this project, prosody recognition of male voices is performed by feature-based analysis of sound files containing short utterances, which were recorded from subjects who were directed to emulate infant-directed speech, which generally contains exaggerated prosody (Breazeal, C and Aryanada, L, 2000). The features used are extracted from the energy and pitch contours in the preprocessing stage. The classifier discriminates amongst four affective classes of speech and neutral utterances. The four classes are prohibition, attentional bids, approval, and soothing, while the neutral utterances are speech, which carries none of the above affective intents. Discrimination is performed using a multistage k-nearest neighbor classifier. The five-way single-stage classifier operates at 62.5 accuracy on the entire male speech data set, while the female single-stage classifier classifies 66.7 percent correctly. Chi-square analysis resulted in a p of less than or equal to 0.001 for each. The data seem to indicate that while female voice data may be somewhat easier to classify than male, fundamental differences that make male utterances unsuitable for classification do not exist.

Brian Scassellati | Avram Lev Robinson-Mosher

[1] Malcolm Slaney,et al. Baby Ears: a recognition system for affective vocalizations , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[2] Mari Ostendorf,et al. TOBI: a standard for labeling English prosody , 1992, ICSLP.

[3] Ian H. Witten,et al. Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[4] Cynthia Breazeal,et al. Recognition of Affective Communicative Intent in Robot-Directed Speech , 2002, Auton. Robots.

[5] Xuejing Sun,et al. Pitch determination and voice quality analysis using Subharmonic-to-Harmonic Ratio , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6] N. Fox,et al. Social perception in infants , 1985 .

[7] A. Fernald,et al. Intonation and communicative intent in mothers' speech to infants: is the melody the message? , 1989, Child development.