暂无分享,去创建一个
Tanaya Guha | Kyle Min | Subarna Tripathi | Sourya Roy | Somdeb Majumdar | Subarna Tripathi | Sourya Roy | T. Guha | Somdeb Majumdar | Kyle Min
[1] Peng Gao,et al. Spatio-Temporal Scene Graphs for Video Dialog , 2020, ArXiv.
[2] Bernard Ghanem,et al. Active Speakers in Context , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[3] Cordelia Schmid,et al. A Structured Model for Action Detection , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[4] Andrew Zisserman,et al. Hello! My name is... Buffy'' -- Automatic Naming of Characters in TV Video , 2006, BMVC.
[5] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.
[6] Tinne Tuytelaars,et al. Cross-Modal Supervision for Learning Active Speaker Detection in Video , 2016, ECCV.
[7] Bernard Ghanem,et al. MAAS: Multi-modal Assignation for Active Speaker Detection , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[8] Cordelia Schmid,et al. Unified Graph Structured Models for Video Understanding , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[9] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[10] S. Shan,et al. Multi-Task Learning for Audio-Visual Active Speaker Detection , 2019 .
[11] Geoffrey E. Hinton,et al. Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.
[12] S. Tripathi,et al. Learnable Graph Inception Network for Emotion Recognition , 2020, ArXiv.
[13] Shrikanth Narayanan,et al. Crossmodal learning for audio-visual speech event localization , 2020, ArXiv.
[14] Kristen Grauman,et al. Ego-Topo: Environment Affordances From Egocentric Video , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[15] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[16] Shiguang Shan,et al. UniCon: Unified Context Network for Robust Active Speaker Detection , 2021, ACM Multimedia.
[17] Yue Wang,et al. Dynamic Graph CNN for Learning on Point Clouds , 2018, ACM Trans. Graph..
[18] Frank Hutter,et al. SGDR: Stochastic Gradient Descent with Warm Restarts , 2016, ICLR.
[19] Florian Metze,et al. Space-Time Crop & Attend: Improving Cross-modal Video Representation Learning , 2021, ArXiv.
[20] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[21] Larry S. Davis,et al. Look who's talking: speaker detection using video and audio correlation , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).
[22] Jure Leskovec,et al. Inductive Representation Learning on Large Graphs , 2017, NIPS.
[23] Jan Eric Lenssen,et al. Fast Graph Representation Learning with PyTorch Geometric , 2019, ArXiv.
[24] Joon Son Chung. Naver at ActivityNet Challenge 2019 - Task B Active Speaker Detection (AVA) , 2019, ArXiv.
[25] Tanaya Guha,et al. Dynamic Emotion Modeling With Learnable Graphs and Graph Inception Network , 2021, IEEE Transactions on Multimedia.
[26] Arkadiusz Stopczynski,et al. Ava Active Speaker: An Audio-Visual Dataset for Active Speaker Detection , 2019, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[27] Zhenzhong Chen,et al. Visual Relationship Forecasting in Videos , 2021, ArXiv.
[28] Kate Saenko,et al. LoGAN: Latent Graph Co-Attention Network for Weakly-Supervised Video Moment Retrieval , 2019, 2021 IEEE Winter Conference on Applications of Computer Vision (WACV).
[29] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[30] Gerhard Rigoll,et al. How to Design a Three-Stage Architecture for Audio-Visual Active Speaker Detection in the Wild , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[31] Andrew Zisserman,et al. Taking the bite out of automated naming of characters in TV video , 2009, Image Vis. Comput..
[32] Rohan Kumar Das,et al. Is Someone Speaking?: Exploring Long-term Temporal Features for Audio-visual Active Speaker Detection , 2021, ACM Multimedia.