暂无分享,去创建一个
[1] Alexei A. Efros,et al. Colorful Image Colorization , 2016, ECCV.
[2] Lorenzo Torresani,et al. Cooperative Learning of Audio and Video Models from Self-Supervised Synchronization , 2018, NeurIPS.
[3] Tomasz Malisiewicz,et al. SuperPoint: Self-Supervised Interest Point Detection and Description , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[4] Andrew Zisserman,et al. Objects that Sound , 2017, ECCV.
[5] Joon Son Chung,et al. Lip Reading Sentences in the Wild , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[6] Stefanos Zafeiriou,et al. ArcFace: Additive Angular Margin Loss for Deep Face Recognition , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[7] Quoc V. Le,et al. Listen, attend and spell: A neural network for large vocabulary conversational speech recognition , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[8] Tae-Hyun Oh,et al. On Learning Associations of Faces and Voices , 2018, ACCV.
[9] Ricardo Vigário,et al. Self-Supervised MRI Tissue Segmentation by Discriminative Clustering , 2014, Int. J. Neural Syst..
[10] Douglas A. Reynolds,et al. An overview of automatic speaker recognition technology , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[11] Joon Son Chung,et al. VoxCeleb: A Large-Scale Speaker Identification Dataset , 2017, INTERSPEECH.
[12] Aggelos K. Katsaggelos,et al. Audio-Visual Biometrics , 2006, Proceedings of the IEEE.
[13] P. Belin,et al. Thinking the voice: neural correlates of voice perception , 2004, Trends in Cognitive Sciences.
[14] Xiaolin Hu,et al. Recurrent convolutional neural network for object recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[15] Geoffrey E. Hinton,et al. Reducing the Dimensionality of Data with Neural Networks , 2006, Science.
[16] Alexei A. Efros,et al. Context Encoders: Feature Learning by Inpainting , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[17] Andrew Owens,et al. Audio-Visual Scene Analysis with Self-Supervised Multisensory Features , 2018, ECCV.
[18] Andrew Zisserman,et al. Seeing Voices and Hearing Faces: Cross-Modal Biometric Matching , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[19] Joon Son Chung,et al. Learning to lip read words by watching videos , 2018, Comput. Vis. Image Underst..
[20] Ke Gong,et al. Look into Person: Self-Supervised Structure-Sensitive Learning and a New Benchmark for Human Parsing , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[21] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[22] Quan Wang,et al. Generalized End-to-End Loss for Speaker Verification , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[23] Joon Son Chung,et al. Out of Time: Automated Lip Sync in the Wild , 2016, ACCV Workshops.
[24] Kihyuk Sohn,et al. Improved Deep Metric Learning with Multi-class N-pair Loss Objective , 2016, NIPS.
[25] L. Shams,et al. Crossmodal influences on visual perception. , 2010, Physics of life reviews.
[26] Andrew Zisserman,et al. Return of the Devil in the Details: Delving Deep into Convolutional Nets , 2014, BMVC.
[27] Andrew Zisserman,et al. Multi-task Self-Supervised Visual Learning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[28] Lorenzo Torresani,et al. Co-Training of Audio and Video Representations from Self-Supervised Temporal Synchronization , 2018, ArXiv.
[29] Xing Ji,et al. CosFace: Large Margin Cosine Loss for Deep Face Recognition , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[30] Joon Son Chung,et al. Signs in time: Encoding human motion as a temporal image , 2016, ArXiv.
[31] Alexei A. Efros,et al. Unsupervised Visual Representation Learning by Context Prediction , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[32] Andrew Owens,et al. Self-Supervised Learning of Audio-Visual Objects from Video , 2020, ECCV.
[33] Joon Son Chung,et al. Disentangled Speech Embeddings Using Cross-Modal Self-Supervision , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[34] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[35] Phillip Isola,et al. Contrastive Multiview Coding , 2019, ECCV.
[36] Yoshua Bengio,et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.
[37] Geoffrey E. Hinton,et al. Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[38] Andrew Zisserman,et al. Learnable PINs: Cross-Modal Embeddings for Person Identity , 2018, ECCV.
[39] Tae-Hyun Oh,et al. Learning to Localize Sound Source in Visual Scenes , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[40] Joon Son Chung,et al. In defence of metric learning for speaker recognition , 2020, INTERSPEECH.
[41] Joon Son Chung,et al. VoxCeleb2: Deep Speaker Recognition , 2018, INTERSPEECH.
[42] Tae-Hyun Oh,et al. Speech2Face: Learning the Face Behind a Voice , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).