Autonomous Speaker Agent

Autonomous Speaker Agent is a graphically embodied animated agent (a virtual character) capable of reading plain English text and rendering it in a form of speech, accompanied by the appropriate, natural-looking facial gestures. The system uses lexical analysis and statistical models of facial gestures in order to generate the gestures related to the spoken text. It is intended for the automatic creation of realistically animated virtual speakers, such as newscasters and storytellers, and incorporates the characteristics of such speakers captured from the training video clips. Autonomous Speaker Agent is based on a visual text-tospeech system which generates lip movement synchronized with the generated speech. This is extended to include eye blinks, head and eyebrow motion, and a simple gaze following behavior. The result is a full face animation produced automatically from plain English text.

[1]  M. Argyle,et al.  Gaze and Mutual Gaze , 1994, British Journal of Psychiatry.

[2]  Igor S. Pandzic,et al.  Facial animation framework for the web and mobile platforms , 2002, Web3D '02.

[3]  Jonas Beskow,et al.  Rule-based visual speech synthesis , 1995, EUROSPEECH.

[4]  Justine Cassell,et al.  BEAT: the Behavior Expression Animation Toolkit , 2001, Life-like characters.

[5]  Mark Steedman,et al.  Generating Facial Expressions for Speech , 1996, Cogn. Sci..

[6]  Michael M. Cohen,et al.  Modeling Coarticulation in Synthetic Visual Speech , 1993 .

[7]  Daniel Thalmann,et al.  SMILE: A Multilayered Facial Animation System , 1991, Modeling in Computer Graphics.

[8]  Bertrand Le Goff,et al.  A French-speaking synthetic head , 1997, AVSP.

[9]  Volker Strom,et al.  Visual prosody: facial movements accompanying speech , 2002, Proceedings of Fifth IEEE International Conference on Automatic Face Gesture Recognition.

[10]  Thoms M. Levergood,et al.  DEC face: an automatic lip-synchronization algorithm for synthetic faces , 1993 .

[11]  Igor S. Pandzic,et al.  Conversational virtual character for the Web , 2002, Proceedings of Computer Animation 2002 (CA 2002).

[12]  Algirdas Pakstas,et al.  MPEG-4 Facial Animation: The Standard,Implementation and Applications , 2002 .

[13]  P. Ekman,et al.  The Repertoire of Nonverbal Behavior: Categories, Origins, Usage, and Coding , 1969 .

[14]  L. Vistnes The Artist??s Complete Guide to Facial Expression , 1992 .

[15]  Jörn Ostermann,et al.  Talking heads and synthetic speech: an architecture for supporting electronic commerce , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[16]  Jonas Beskow,et al.  Developing a 3D-agent for the august dialogue system , 1999, AVSP.

[17]  Scott A. King,et al.  Issues with lip sync animation: can you read my lips? , 2002, Proceedings of Computer Animation 2002 (CA 2002).

[18]  S. Duncan,et al.  Some Signals and Rules for Taking Speaking Turns in Conversations , 1972 .

[19]  J. P. Lewis,et al.  Automated lip-synch and speech synthesis for character animation , 1987, CHI '87.

[20]  Justine Cassell,et al.  Semantic and Discourse Information for Text-to-Speech Intonation , 1997, Workshop On Concept To Speech Generation Systems.

[21]  N. Badler,et al.  Eyes Alive Eyes Alive Eyes Alive Figure 1: Sample Images of an Animated Face with Eye Movements , 2022 .

[22]  A. Bruce Emotional Expression , 1883, The American Naturalist.

[23]  Uri Hadar,et al.  Kinematics of head movements accompanying speech during conversation , 1983 .

[24]  Mari Ostendorf,et al.  TOBI: a standard for labeling English prosody , 1992, ICSLP.

[25]  Nicole Chovil Discourse‐oriented facial displays in conversation , 1991 .