暂无分享,去创建一个
Andrew Owens | Joon Son Chung | Andrew Zisserman | Triantafyllos Afouras | Andrew Zisserman | Andrew Owens | Triantafyllos Afouras
[1] Andrew Zisserman,et al. Video Representation Learning by Dense Predictive Coding , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).
[2] Joon Son Chung,et al. VoxCeleb: A Large-Scale Speaker Identification Dataset , 2017, INTERSPEECH.
[3] Shmuel Peleg,et al. Seeing Through Noise: Visually Driven Speaker Separation And Enhancement , 2017, ICASSP.
[4] Joon Son Chung,et al. LRS3-TED: a large-scale dataset for visual speech recognition , 2018, ArXiv.
[5] Alexei A. Efros,et al. Unsupervised Visual Representation Learning by Context Prediction , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[6] Kristen Grauman,et al. Co-Separating Sounds of Visual Objects , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[7] Joon Son Chung,et al. Perfect Match: Improved Cross-modal Embeddings for Audio-visual Synchronisation , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[8] Kevin Wilson,et al. Looking to listen at the cocktail party , 2018, ACM Trans. Graph..
[9] Mubarak Shah,et al. Multimodal Analysis for Identification and Segmentation of Moving-Sounding Objects , 2013, IEEE Transactions on Multimedia.
[10] Joon Son Chung,et al. Deep Audio-Visual Speech Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[11] Andrew Zisserman,et al. Objects that Sound , 2017, ECCV.
[12] Radu Horaud,et al. Audio-Visual Speaker Diarization Based on Spatiotemporal Bayesian Fusion , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[13] Michael Elad,et al. Pixels that sound , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).
[14] Yoav Y. Schechner,et al. Harmony in Motion , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.
[15] Vittorio Murino,et al. Voice Activity Detection by Upper Body Motion Analysis and Unsupervised Domain Adaptation , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).
[16] Oriol Vinyals,et al. Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.
[17] Andrew Owens,et al. Learning Sight from Sound: Ambient Sound Provides Supervision for Visual Learning , 2017, International Journal of Computer Vision.
[18] Chuang Gan,et al. The Sound of Pixels , 2018, ECCV.
[19] Joon Son Chung,et al. Out of Time: Automated Lip Sync in the Wild , 2016, ACCV Workshops.
[20] Peter V. Gehler,et al. Semantic Video CNNs Through Representation Warping , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[21] Kaiming He,et al. Momentum Contrast for Unsupervised Visual Representation Learning , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[22] Tinne Tuytelaars,et al. Cross-Modal Supervision for Learning Active Speaker Detection in Video , 2016, ECCV.
[23] Larry S. Davis,et al. Look who's talking: speaker detection using video and audio correlation , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).
[24] Tae-Hyun Oh,et al. Learning to Localize Sound Source in Visual Scenes , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[25] Chuang Gan,et al. Self-Supervised Moving Vehicle Tracking With Stereo Sound , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[26] Laurens van der Maaten,et al. Self-Supervised Learning of Pretext-Invariant Representations , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[27] Andrew Zisserman,et al. Memory-augmented Dense Predictive Coding for Video Representation Learning , 2020, ECCV.
[28] Andrew Zisserman,et al. Return of the Devil in the Details: Delving Deep into Convolutional Nets , 2014, BMVC.
[29] Trevor Darrell,et al. Learning Joint Statistical Models for Audio-Visual Fusion and Segregation , 2000, NIPS.
[30] Geoffrey E. Hinton,et al. A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.
[31] Joon Son Chung,et al. Disentangled Speech Embeddings Using Cross-Modal Self-Supervision , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[32] Stefanos Zafeiriou,et al. RetinaFace: Single-stage Dense Face Localisation in the Wild , 2019, ArXiv.
[33] Abhishek Dutta,et al. The VIA Annotation Software for Images, Audio and Video , 2019, ACM Multimedia.
[34] Ali Razavi,et al. Data-Efficient Image Recognition with Contrastive Predictive Coding , 2019, ICML.
[35] Wei Liu,et al. SSD: Single Shot MultiBox Detector , 2015, ECCV.
[36] Joon Son Chung,et al. Signs in time: Encoding human motion as a temporal image , 2016, ArXiv.
[37] Naji Khosravan,et al. On Attention Modules for Audio-Visual Synchronization , 2018, CVPR Workshops.
[38] Phillip Isola,et al. Contrastive Multiview Coding , 2019, ECCV.
[39] Andrew Zisserman,et al. Flowing ConvNets for Human Pose Estimation in Videos , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[40] Andries P. Hekstra,et al. Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).
[41] Kristen Grauman,et al. 2.5D Visual Sound , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[42] Irene Kotsia,et al. RetinaFace: Single-Shot Multi-Level Face Localisation in the Wild , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[43] Andrew Owens,et al. Audio-Visual Scene Analysis with Self-Supervised Multisensory Features , 2018, ECCV.
[44] B. Holden. Listen and learn , 2002 .
[45] Joon Son Chung,et al. Who said that?: Audio-visual speaker diarisation of real-world meetings , 2019, INTERSPEECH.
[46] Javier R. Movellan,et al. Audio Vision: Using Audio-Visual Synchrony to Locate Sounds , 1999, NIPS.
[47] Andrew Owens,et al. Visually Indicated Sounds , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[48] Lorenzo Torresani,et al. Co-Training of Audio and Video Representations from Self-Supervised Temporal Synchronization , 2018, ArXiv.
[49] Chuang Gan,et al. The Sound of Motions , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[50] Nitish Srivastava. Unsupervised Learning of Visual Representations using Videos , 2015 .
[51] Joon Son Chung,et al. My lips are concealed: Audio-visual speech enhancement through obstructions , 2019, INTERSPEECH.
[52] Arkadiusz Stopczynski,et al. Ava Active Speaker: An Audio-Visual Dataset for Active Speaker Detection , 2019, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[53] Chuang Gan,et al. Self-supervised Audio-visual Co-segmentation , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[54] Andrew Zisserman,et al. Look, Listen and Learn , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[55] Rémi Gribonval,et al. BSS_EVAL Toolbox User Guide -- Revision 2.0 , 2005 .
[56] Cordelia Schmid,et al. Supplementary Material: AVA-ActiveSpeaker: An Audio-Visual Dataset for Active Speaker Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).
[57] Joon Son Chung,et al. The Conversation: Deep Audio-Visual Speech Enhancement , 2018, INTERSPEECH.
[58] James R. Glass,et al. Jointly Discovering Visual Objects and Spoken Words from Raw Sensory Input , 2018, ECCV.
[59] Sukhendu Das,et al. See the Sound, Hear the Pixels , 2020, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).
[60] Xuelong Li,et al. Deep Multimodal Clustering for Unsupervised Audiovisual Learning , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[61] Rogério Schmidt Feris,et al. Learning to Separate Object Sounds by Watching Unlabeled Video , 2018, ECCV.
[62] Chenliang Xu,et al. Audio-Visual Event Localization in Unconstrained Videos , 2018, ECCV.
[63] Joon Son Chung,et al. VoxCeleb2: Deep Speaker Recognition , 2018, INTERSPEECH.