Corpus-based generation of head and eyebrow motion for an embodied conversational agent

Humans are known to use a wide range of non-verbal behaviour while speaking. Generating naturalistic embodied speech for an artificial agent is therefore an application where techniques that draw directly on recorded human motions can be helpful. We present a system that uses corpus-based selection strategies to specify the head and eyebrow motion of an animated talking head. We first describe how a domain-specific corpus of facial displays was recorded and annotated, and outline the regularities that were found in the data. We then present two different methods of selecting motions for the talking head based on the corpus data: one that chooses the majority option in all cases, and one that makes a weighted choice among all of the options. We compare these methods to each other in two ways: through cross-validation against the corpus, and by asking human judges to rate the output. The results of the two evaluation studies differ: the cross-validation study favoured the majority strategy, while the human judges preferred schedules generated using weighted choice. The judges in the second study also showed a preference for the original corpus data over the output of either of the generation strategies.

[1]  Matthew Stone,et al.  Speaking with hands: creating animated conversational characters from recordings of human performance , 2004, ACM Trans. Graph..

[2]  Srinivas Bangalore,et al.  Evaluation Metrics for Generation , 2000, INLG.

[3]  Simon King,et al.  Festival 2 - build your own general purpose unit selection speech synthesiser , 2004, SSW.

[4]  Volker Strom,et al.  Visual prosody: facial movements accompanying speech , 2002, Proceedings of Fifth IEEE International Conference on Automatic Face Gesture Recognition.

[5]  Rebecca J. Passonneau Computing Reliability for Coreference Annotation , 2004, LREC.

[6]  Kevin Knight,et al.  Generation that Exploits Corpus-Based Statistical Knowledge , 1998, ACL.

[7]  Yukiko I. Nakano,et al.  Non-Verbal Cues for Discourse Structure , 2022 .

[8]  Michael Kipp,et al.  Gesture generation by imitation: from human behavior to computer character animation , 2005 .

[9]  Nadia Mana,et al.  HMM-based synthesis of emotional facial expressions during speech in synthetic talking heads , 2006, ICMI '06.

[10]  Hao Yan,et al.  More than just a pretty face: conversational protocols and the affordances of embodiment , 2001, Knowl. Based Syst..

[11]  Matthew Stone,et al.  Specifying and animating facial signals for discourse in embodied conversational agents , 2004, Comput. Animat. Virtual Worlds.

[12]  Anja Belz,et al.  Comparing Automatic and Human Evaluation of NLG Systems , 2006, EACL.

[13]  Elisabeth André,et al.  Catch me if you can: exploring lying agents in social settings , 2005, AAMAS '05.

[14]  Mary Ellen Foster,et al.  Evaluating the impact of variation in automatically generated embodied object descriptions , 2007 .

[15]  Roberta Catizone,et al.  Multimodal Generation in the COMIC Dialogue System , 2005, ACL.

[16]  Catherine Pelachaud,et al.  From Discourse Plans to Believable Behavior Generation , 2002, INLG.

[17]  M. Cranach,et al.  Human Ethology: Claims and Limits of a New Discipline. , 1982 .

[18]  Michael White,et al.  Efficient Realization of Coordinate Structures in Combinatory Categorial Grammar , 2006 .

[19]  Emiel Krahmer,et al.  How Children and Adults Produce and Perceive Uncertainty in Audiovisual Speech , 2005, Language and speech.

[20]  Irene Langkilde-Geary,et al.  An Empirical Verification of Coverage and Correctness for a General-Purpose Sentence Generator , 2002, INLG.

[21]  Yvonne Freeh,et al.  An R and S–PLUS Companion to Applied Regression , 2004 .

[22]  Ron Artstein Kappa 3 = Alpha ( or Beta ) , 2005 .

[23]  Jon Oberlander,et al.  Data-Driven Generation of Emphatic Facial Displays , 2006, EACL.

[24]  Jeanne Dunning Speaking With Hands , 2004 .

[25]  Mark Steedman,et al.  Information Structure and the Syntax-Phonology Interface , 2000, Linguistic Inquiry.