Perfect Match: Improved Cross-modal Embeddings for Audio-visual Synchronisation
暂无分享,去创建一个
[1] Alexei A. Efros,et al. Context Encoders: Feature Learning by Inpainting , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[2] Joon Son Chung,et al. Out of Time: Automated Lip Sync in the Wild , 2016, ACCV Workshops.
[3] Tae-Hyun Oh,et al. Learning to Localize Sound Source in Visual Scenes , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[4] Andrew Zisserman,et al. Objects that Sound , 2017, ECCV.
[5] Tae-Hyun Oh,et al. On Learning Associations of Faces and Voices , 2018, ACCV.
[6] Joon Son Chung,et al. Learning to lip read words by watching videos , 2018, Comput. Vis. Image Underst..
[7] Lorenzo Torresani,et al. Co-Training of Audio and Video Representations from Self-Supervised Temporal Synchronization , 2018, ArXiv.
[8] Alexei A. Efros,et al. Colorful Image Colorization , 2016, ECCV.
[9] Andrew Zisserman,et al. Return of the Devil in the Details: Delving Deep into Convolutional Nets , 2014, BMVC.
[10] Shmuel Peleg,et al. Dynamic Temporal Alignment of Speech to Lips , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[11] Joon Son Chung,et al. Lip Reading in the Wild , 2016, ACCV.
[12] Geoffrey E. Hinton,et al. Reducing the Dimensionality of Data with Neural Networks , 2006, Science.
[13] Joon Son Chung,et al. Lip Reading Sentences in the Wild , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[14] Andrew Owens,et al. Audio-Visual Scene Analysis with Self-Supervised Multisensory Features , 2018, ECCV.
[15] Andrew Zisserman,et al. Seeing Voices and Hearing Faces: Cross-Modal Biometric Matching , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.