论文信息 - Out of Time: Automated Lip Sync in the Wild - 字舞流文

Out of Time: Automated Lip Sync in the Wild

The goal of this work is to determine the audio-video synchronisation between mouth motion and speech in a video.

Joon Son Chung | Andrew Zisserman | Andrew Zisserman

[1] Matti Pietikäinen,et al. A Compact Representation of Visual Speech Data Using Latent Variables , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2] Takeo Kanade,et al. An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[3] David F. McAllister,et al. Lip synchronization for animation , 1997, SIGGRAPH '97.

[4] Andrew Zisserman,et al. Return of the Devil in the Details: Delving Deep into Convolutional Nets , 2014, BMVC.

[5] Igor S. Pandzic,et al. A Real-Time Lip SYNC System Using a Genetic Algorithm for Automatic Neural Network Configuration , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[6] Andrew Zisserman,et al. Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.

[7] Stan Davis,et al. Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[8] Rainer Lienhart,et al. Reliable Transition Detection in Videos: A Survey and Practitioner's Guide , 2001, Int. J. Image Graph..

[9] Matthew Richardson,et al. Compressing LSTMs into CNNs , 2015, ArXiv.

[10] Vaibhava Goel,et al. Detecting audio-visual synchrony using deep neural networks , 2015, INTERSPEECH.

[11] Gérard Chollet,et al. Audiovisual Speech Synchrony Measure: Application to Biometrics , 2007, EURASIP J. Adv. Signal Process..

[12] John Lewis,et al. Automated lip-sync: Background and techniques , 1991, Comput. Animat. Virtual Worlds.

[13] Enrique Argones-Rúa,et al. Audio-visual speech asynchrony detection using co-inertia analysis and coupled hidden markov models , 2009, Pattern Analysis and Applications.

[14] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[15] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[16] David A. Forsyth,et al. Editorial: State of the Journal , 2014, IEEE Trans. Pattern Anal. Mach. Intell..

[17] Andrea Vedaldi,et al. MatConvNet: Convolutional Neural Networks for MATLAB , 2014, ACM Multimedia.

[18] Andrew Zisserman,et al. Faces in Places: compound query retrieval , 2016, BMVC.

[19] Tinne Tuytelaars,et al. Cross-Modal Supervision for Learning Active Speaker Detection in Video , 2016, ECCV.

[20] Yann LeCun,et al. Learning a similarity metric discriminatively, with application to face verification , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[21] Matti Pietikäinen,et al. OuluVS2: A multi-view audiovisual database for non-rigid mouth motion analysis , 2015, 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[22] Davis E. King,et al. Dlib-ml: A Machine Learning Toolkit , 2009, J. Mach. Learn. Res..

[23] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[24] A. Murat Tekalp,et al. Audiovisual Synchronization and Fusion Using Canonical Correlation Analysis , 2007, IEEE Transactions on Multimedia.

[25] David F. McAllister,et al. Lip synchronization of speech , 1997, AVSP.

[26] D. Bitzer,et al. Automated lip-sync: direct translation of speech-sound to mouth-shape , 1994, Proceedings of 1994 28th Asilomar Conference on Signals, Systems and Computers.

[27] Satoshi Nakamura,et al. Audio-visual speech translation with automatic lip syncqronization and face tracking based on 3-D head model , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[28] Joon Son Chung,et al. Lip Reading in the Wild , 2016, ACCV.

[29] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[30] Trevor Darrell,et al. Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.