A Web-Oriented Java3D Talking Head

Facial animation denotes all those systems performing speech synchronization with an animated face model. These kinds of systems are named Talking Heads or Talking Faces. At the same time simple dialogue systems called chatbots have been developed. Chatbots are software agents able to interact with users through pattern-matching based rules. In this paper a Talking Head oriented to the creation of a Chatbot is presented. An answer is generated in form of text triggered by an input query. The answer is converted into a facial animation using a 3D face model whose lips movements are synchronized with the sound produced by a speech synthesis module. Our Talking Head exploits the naturalness of the fa cial animation and provides a real-time interactive interface to the user. Besides, it is specifically suited for being used on the web. This leads to a set of requirements to be satisfied, like: simple installation, visual quality, fast download, and interactivity in real time. The web infrastructure has been realized using the Client-Server model. The Chatbot, the Natural Language Processing and the Digital Signal Processing services are delegated to the server. The client is involved in animation and synchronization. This way, the server can handle multiple requests from clients. The conversation module has been implemented using the A.L.I.C.E. (Artificial Linguistic Internet Computer Entity) technology. The output of the chatbot is given input to the Natural Language Processing (Comedia Speech), incorporating a text analyzer, a letter-to-sound module and a module for the generation of prosody. The client, through the synchronization module, computes the time of real duration of the animation and the duration of each phoneme and consequently of each viseme. The morphing module performs the animation of the facial model and the voice reproduction. As a result, the user will see the answer to question both in textual form and in the form of visual animation.

[1]  A. Murat Tekalp,et al.  Face and 2-D mesh animation in MPEG-4 , 2000, Signal Process. Image Commun..

[2]  Justine Cassell,et al.  BEAT: the Behavior Expression Animation Toolkit , 2001, Life-like characters.

[3]  Carlo Drioli,et al.  INTERFACE: a new tool for building emotive/expressive talking heads , 2005, INTERSPEECH.

[4]  Alessandro Genco,et al.  MAGA: A Mobile Archaeological Guide at Agrigento , 2006 .

[5]  Daniel Thalmann,et al.  Models and Techniques in Computer Animation , 2014, Computer Animation Series.

[6]  Christoph Bregler,et al.  Video Rewrite: Driving Visual Speech with Audio , 1997, SIGGRAPH.

[7]  Ronald A. Cole,et al.  ITALIAN LITERACY TUTOR tools and technologies for individuals with cognitive disabilities , 2004 .

[8]  Thomas W. Sederberg,et al.  Free-form deformation of solid geometric models , 1986, SIGGRAPH.

[9]  Jörn Ostermann,et al.  Talking heads and synthetic speech: an architecture for supporting electronic commerce , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[10]  Alessandro Genco,et al.  An agent-based service network for personal mobile devices , 2006, IEEE Pervasive Computing.

[11]  P. Ekman,et al.  Facial action coding system: a technique for the measurement of facial movement , 1978 .

[12]  David G. Stork,et al.  Speechreading by Humans and Machines , 1996 .

[13]  Bill Fleming,et al.  Animating Facial Features & Expressions , 1998 .

[14]  Tony Ezzat,et al.  MikeTalk: a talking facial display based on morphing visemes , 1998, Proceedings Computer Animation '98 (Cat. No.98EX169).

[15]  Jörn Ostermann,et al.  Realistic facial animation system for interactive services , 2008, INTERSPEECH.

[16]  Tomaso Poggio,et al.  Trainable Videorealistic Speech Animation , 2004, FGR.

[17]  Keith Waters,et al.  Computer facial animation , 1996 .

[18]  Fabio Lavagetto,et al.  The facial animation engine: toward a high-level interface for the design of MPEG-4 compliant animated faces , 1999, IEEE Trans. Circuits Syst. Video Technol..

[19]  C. G. Fisher,et al.  Confusions among visually perceived consonants. , 1968, Journal of speech and hearing research.

[20]  Michael M. Cohen,et al.  Modeling Coarticulation in Synthetic Visual Speech , 1993 .

[21]  Tsuhan Chen,et al.  Audiovisual speech processing , 2001, IEEE Signal Process. Mag..

[22]  Demetri Terzopoulos,et al.  Realistic modeling for facial animation , 1995, SIGGRAPH.

[23]  Jörn Ostermann,et al.  Robust AAM building for morphing in an image-based facial animation system , 2008, 2008 IEEE International Conference on Multimedia and Expo.

[24]  Hans Peter Graf,et al.  Sample-based synthesis of photo-realistic talking heads , 1998, Proceedings Computer Animation '98 (Cat. No.98EX169).

[25]  Anders Löfqvist,et al.  Speech as Audible Gestures , 1990 .

[26]  Fabio Abbattista,et al.  SAMIR: A Smart 3D Assistant on the Web , 2004, PsychNology J..

[27]  Thoms M. Levergood,et al.  DEC face: an automatic lip-synchronization algorithm for synthetic faces , 1993 .

[28]  Hans Peter Graf,et al.  Photo-Realistic Talking-Heads from Image Samples , 2000, IEEE Trans. Multim..

[29]  Demetri Terzopoulos,et al.  Physically-based facial modelling, analysis, and animation , 1990, Comput. Animat. Virtual Worlds.

[30]  Thomas Vetter,et al.  A morphable model for the synthesis of 3D faces , 1999, SIGGRAPH.

[31]  Tsuhan Chen,et al.  Audio-visual integration in multimodal communication , 1998, Proc. IEEE.

[32]  Jonathan G. Fiscus,et al.  Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .

[33]  Luc J. Van Gool,et al.  Lip animation based on observed 3D speech dynamics , 2000, IS&T/SPIE Electronic Imaging.

[34]  Timothy F. Cootes,et al.  Active Appearance Models , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[35]  Jörn Ostermann,et al.  An Animation Definition Interface Rapid Design of MPEG-4 Compliant Animated Faces and Bodies , 1998 .

[36]  Piero Cosi,et al.  LUCIA a new italian talking-head based on a modified cohen-massaro's labial coarticulation model , 2003, INTERSPEECH.

[37]  Cinzia Avesani,et al.  Festival speaks Italian! , 2001, INTERSPEECH.

[38]  T. R. Anderson,et al.  Auditory models with Kohonen SOFM and LVQ for speaker independent phoneme recognition , 1994, Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN'94).

[39]  Daniel Thalmann,et al.  Simulation of Facial Muscle Actions Based on Rational Free Form Deformations , 1992, Comput. Graph. Forum.

[40]  Piero Cosi,et al.  Italian Literacy Tutor: un adattamento all'italiano del "Colorado Literacy Tutor" , 2004 .

[41]  Jörn Ostermann,et al.  From audio-only to audio and video text-to-speech , 2004 .