Visual Speech Animation

[1]  Jörn Ostermann,et al.  Lifelike talking faces for interactive services , 2003, Proc. IEEE.

[2]  Michael M. Cohen,et al.  Modeling Coarticulation in Synthetic Visual Speech , 1993 .

[3]  Frank K. Soong,et al.  A deep bidirectional LSTM approach for video-realistic talking head , 2016, Multimedia Tools and Applications.

[4]  Ricardo Gutierrez-Osuna,et al.  Audio/visual mapping with cross-modal hidden Markov models , 2005, IEEE Transactions on Multimedia.

[5]  Justus Thies,et al.  Face2Face: Real-Time Face Capture and Reenactment of RGB Videos , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Jörn Ostermann,et al.  Talking faces - technologies and applications , 2004, ICPR 2004.

[7]  Keiichi Tokuda,et al.  HMM-based text-to-audio-visual speech synthesis , 2000, INTERSPEECH.

[8]  Gérard Bailly,et al.  Animating Virtual Speakers or Singers from Audio: Lip-Synching Facial Animation , 2009, EURASIP J. Audio Speech Music. Process..

[9]  D. Massaro Perceiving talking faces: from speech perception to a behavioral principle , 1999 .

[10]  Christoph Bregler,et al.  Video Rewrite: Driving Visual Speech with Audio , 1997, SIGGRAPH.

[11]  Matti Pietikäinen,et al.  Facial 3D Shape Estimation from Images for Visual Speech Animation , 2014, 2014 22nd International Conference on Pattern Recognition.

[12]  Hans Peter Graf,et al.  Photo-Realistic Talking-Heads from Image Samples , 2000, IEEE Trans. Multim..

[13]  Frederick I. Parke,et al.  Computer generated animation of faces , 1972, ACM Annual Conference.

[14]  Björn Stenger,et al.  Expressive Visual Text-to-Speech Using Active Appearance Models , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  D. Massaro Speech Perception By Ear and Eye: A Paradigm for Psychological Inquiry , 1989 .

[16]  Dong Yu,et al.  Deep Learning: Methods and Applications , 2014, Found. Trends Signal Process..

[17]  Gregor Hofer,et al.  HMM-based automatic eye-blink synthesis from speech , 2009, INTERSPEECH.

[18]  Frédéric H. Pighin,et al.  Expressive speech-driven facial animation , 2005, TOGS.

[19]  Gang Chen,et al.  Computer-Assisted Audiovisual Language Learning , 2012, Computer.

[20]  Lei Xie,et al.  Head motion synthesis from speech using deep neural networks , 2015, Multimedia Tools and Applications.

[21]  Frank K. Soong,et al.  Synthesizing photo-real talking head via trajectory-guided sample selection , 2010, INTERSPEECH.

[22]  Tony Ezzat,et al.  Trainable videorealistic speech animation , 2002, Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004. Proceedings..

[23]  Maxine Eskénazi,et al.  An overview of spoken language technology for education , 2009, Speech Commun..

[24]  Matthew R. Scott,et al.  Towards a Specialized Search Engine for Language Learners [Point of View] , 2011 .

[25]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.

[26]  Zhi-Jie Yan,et al.  An HMM trajectory tiling (HTT) approach to high quality TTS , 2010, INTERSPEECH.

[27]  Atef Ben Youssef,et al.  Articulatory features for speech-driven head motion synthesis , 2013, INTERSPEECH.

[28]  Jun Du,et al.  Robust speech recognition with speech enhanced deep neural networks , 2014, INTERSPEECH.

[29]  Phil Hoole,et al.  Announcing the Electromagnetic Articulography (Day 1) Subset of the mngu0 Articulatory Corpus , 2011, INTERSPEECH.

[30]  Sascha Fagel,et al.  An articulation model for audiovisual speech synthesis - Determination, adjustment, evaluation , 2004, Speech Commun..

[31]  Moshe Mahler,et al.  Dynamic units of visual speech , 2012, SCA '12.

[32]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[33]  B. Seidlhofer Common ground and different realities: world Englishes and English as a lingua franca , 2009 .

[34]  Zhigang Deng,et al.  Live Speech Driven Head-and-Eye Motion Generators , 2012, IEEE Transactions on Visualization and Computer Graphics.

[35]  Justus Thies,et al.  Demo of Face2Face: real-time face capture and reenactment of RGB videos , 2016, SIGGRAPH Emerging Technologies.

[36]  Alan W. Black,et al.  Unit selection in a concatenative speech synthesis system using a large speech database , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[37]  Keiichi Tokuda,et al.  Speech parameter generation algorithms for HMM-based speech synthesis , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[38]  Hui Chen,et al.  Phoneme-level articulatory animation in pronunciation training , 2012, Speech Commun..

[39]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[40]  H. McGurk,et al.  Hearing lips and seeing voices , 1976, Nature.

[41]  Gwenn Englebienne,et al.  A probabilistic model for generating realistic lip movements from speech , 2007, NIPS.

[42]  Kuldip K. Paliwal,et al.  Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[43]  Bo Zhang,et al.  A New Phonetic Candidate Generator for Improving Search Query Efficiency , 2011, INTERSPEECH.

[44]  Anna Hjalmarsson,et al.  Embodied conversational agents in computer assisted language learning , 2009, Speech Commun..

[45]  Gérard Bailly,et al.  LIPS2008: visual speech synthesis challenge , 2008, INTERSPEECH.

[46]  Frank K. Soong,et al.  High quality lips animation with speech and captured facial action unit as A/V input , 2012, Proceedings of The 2012 Asia Pacific Signal and Information Processing Association Annual Summit and Conference.

[47]  Tony Ezzat,et al.  Visual Speech Synthesis by Morphing Visemes , 2000, International Journal of Computer Vision.

[48]  Zhigang Deng,et al.  Data-Driven 3D Facial Animation , 2007 .

[49]  Paul Taylor,et al.  Text-to-Speech Synthesis , 2009 .

[50]  Gérard Bailly,et al.  Visual articulatory feedback for phonetic correction in second language learning , 2010 .

[51]  Frank K. Soong,et al.  HMM trajectory-guided sample selection for photo-realistic talking head , 2014, Multimedia Tools and Applications.

[52]  W. H. Sumby,et al.  Erratum: Visual Contribution to Speech Intelligibility in Noise [J. Acoust. Soc. Am. 26, 212 (1954)] , 1954 .

[53]  Hao Li,et al.  Realtime performance-based facial animation , 2011, ACM Trans. Graph..

[54]  Algirdas Pakstas,et al.  MPEG-4 Facial Animation: The Standard,Implementation and Applications , 2002 .

[55]  Lei Xie,et al.  Expressive talking avatar synthesis and animation , 2015, Multimedia Tools and Applications.

[56]  Igor S. Pandzic,et al.  MPEG-4 Facial Animation , 2002 .

[57]  Keiichi Tokuda,et al.  Text-to-visual speech synthesis based on parameter generation from HMM , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[58]  Frank K. Soong,et al.  Rendering a personalized photo-real talking head from short video footage , 2010, 2010 7th International Symposium on Chinese Spoken Language Processing.

[59]  Lei Xie,et al.  Realistic Mouth-Synching for Speech-Driven Talking Face Using Articulatory Modelling , 2007, IEEE Transactions on Multimedia.

[60]  Hans Peter Graf,et al.  Sample-based synthesis of photo-realistic talking heads , 1998, Proceedings Computer Animation '98 (Cat. No.98EX169).

[61]  Zhigang Deng,et al.  Rigid Head Motion in Expressive Speech Animation: Analysis and Synthesis , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[62]  John P. Lewis,et al.  Automated eye motion using texture synthesis , 2005, IEEE Computer Graphics and Applications.

[63]  Lei Xie,et al.  Articulatory movement prediction using deep bidirectional long short-term memory based recurrent neural networks and word/phone embeddings , 2015, INTERSPEECH.

[64]  Lei Xie,et al.  Photo-real talking head with deep bidirectional LSTM , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[65]  Gérard Bailly,et al.  Analyzing Gaze During Face-to-Face Interaction , 2007, IVA.

[66]  Sepp Hochreiter,et al.  The Vanishing Gradient Problem During Learning Recurrent Neural Nets and Problem Solutions , 1998, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[67]  Timothy F. Cootes,et al.  Active Appearance Models , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[68]  Ronald J. Williams,et al.  A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.

[69]  Frank K. Soong,et al.  High quality lip-sync animation for 3D photo-realistic talking head , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[70]  Simon King,et al.  Letter-to-Sound Pronunciation Prediction Using Conditional Random Fields , 2011, IEEE Signal Processing Letters.

[71]  Sam T. Roweis,et al.  EM Algorithms for PCA and SPCA , 1997, NIPS.

[72]  Lianhong Cai,et al.  Head and facial gestures synthesis using PAD model for an expressive talking avatar , 2014, Multimedia Tools and Applications.

[73]  Frank K. Soong,et al.  Synthesizing visual speech trajectory with minimum generation error , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[74]  Yongxin Wang,et al.  Emotional Audio-Visual Speech Synthesis Based on PAD , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[75]  Lei Xie,et al.  A coupled HMM approach to video-realistic speech animation , 2007, Pattern Recognit..

[76]  Karen Kukich,et al.  Techniques for automatically correcting words in text , 1992, CSUR.

[77]  Zhi-Jie Yan,et al.  RIch-context Unit Selection (RUS) approach to high quality TTS , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[78]  R. Plomp,et al.  Speechreading supplemented with formant-frequency information from voiced speech. , 1985, The Journal of the Acoustical Society of America.

[79]  Heiga Zen,et al.  Statistical parametric speech synthesis using deep neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[80]  Jianwu Dang,et al.  Visualization of Mandarin articulation by using a physiological articulatory model , 2013, 2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference.