Out of Time: Automated Lip Sync in the Wild

The goal of this work is to determine the audio-video synchronisation between mouth motion and speech in a video.

[1]  Matti Pietikäinen,et al.  A Compact Representation of Visual Speech Data Using Latent Variables , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Takeo Kanade,et al.  An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[3]  David F. McAllister,et al.  Lip synchronization for animation , 1997, SIGGRAPH '97.

[4]  Andrew Zisserman,et al.  Return of the Devil in the Details: Delving Deep into Convolutional Nets , 2014, BMVC.

[5]  Igor S. Pandzic,et al.  A Real-Time Lip SYNC System Using a Genetic Algorithm for Automatic Neural Network Configuration , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[6]  Andrew Zisserman,et al.  Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.

[7]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[8]  Rainer Lienhart,et al.  Reliable Transition Detection in Videos: A Survey and Practitioner's Guide , 2001, Int. J. Image Graph..

[9]  Matthew Richardson,et al.  Compressing LSTMs into CNNs , 2015, ArXiv.

[10]  Vaibhava Goel,et al.  Detecting audio-visual synchrony using deep neural networks , 2015, INTERSPEECH.

[11]  Gérard Chollet,et al.  Audiovisual Speech Synchrony Measure: Application to Biometrics , 2007, EURASIP J. Adv. Signal Process..

[12]  John Lewis,et al.  Automated lip-sync: Background and techniques , 1991, Comput. Animat. Virtual Worlds.

[13]  Enrique Argones-Rúa,et al.  Audio-visual speech asynchrony detection using co-inertia analysis and coupled hidden markov models , 2009, Pattern Analysis and Applications.

[14]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[15]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[16]  David A. Forsyth,et al.  Editorial: State of the Journal , 2014, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  Andrea Vedaldi,et al.  MatConvNet: Convolutional Neural Networks for MATLAB , 2014, ACM Multimedia.

[18]  Andrew Zisserman,et al.  Faces in Places: compound query retrieval , 2016, BMVC.

[19]  Tinne Tuytelaars,et al.  Cross-Modal Supervision for Learning Active Speaker Detection in Video , 2016, ECCV.

[20]  Yann LeCun,et al.  Learning a similarity metric discriminatively, with application to face verification , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[21]  Matti Pietikäinen,et al.  OuluVS2: A multi-view audiovisual database for non-rigid mouth motion analysis , 2015, 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[22]  Davis E. King,et al.  Dlib-ml: A Machine Learning Toolkit , 2009, J. Mach. Learn. Res..

[23]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[24]  A. Murat Tekalp,et al.  Audiovisual Synchronization and Fusion Using Canonical Correlation Analysis , 2007, IEEE Transactions on Multimedia.

[25]  David F. McAllister,et al.  Lip synchronization of speech , 1997, AVSP.

[26]  D. Bitzer,et al.  Automated lip-sync: direct translation of speech-sound to mouth-shape , 1994, Proceedings of 1994 28th Asilomar Conference on Signals, Systems and Computers.

[27]  Satoshi Nakamura,et al.  Audio-visual speech translation with automatic lip syncqronization and face tracking based on 3-D head model , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[28]  Joon Son Chung,et al.  Lip Reading in the Wild , 2016, ACCV.

[29]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[30]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.