UniCon: Unified Context Network for Robust Active Speaker Detection
暂无分享,去创建一个
Shiguang Shan | Zhongqin Wu | Xilin Chen | Xiao Liu | Yuanhang Zhang | Susan Liang | Shuang Yang | S. Shan | Xilin Chen | Yuanhang Zhang | Shuang Yang | Xiao Liu | Zhongqin Wu | Susan Liang
[1] Andrew Owens,et al. Self-Supervised Learning of Audio-Visual Objects from Video , 2020, ECCV.
[2] Radu Horaud,et al. Audio-Visual Speaker Diarization Based on Spatiotemporal Bayesian Fusion , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[3] Joon Son Chung,et al. Perfect Match: Improved Cross-modal Embeddings for Audio-visual Synchronisation , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[4] Jian Sun,et al. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[5] Kevin Wilson,et al. Looking to listen at the cocktail party , 2018, ACM Trans. Graph..
[6] Masafumi Nishida,et al. Turn-alignment using eye-gaze and speech in conversational interaction , 2010, INTERSPEECH.
[7] James Philbin,et al. FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[8] Jason Weston,et al. Curriculum learning , 2009, ICML '09.
[9] Vittorio Murino,et al. Voice Activity Detection by Upper Body Motion Analysis and Unsupervised Domain Adaptation , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).
[10] Irene Kotsia,et al. RetinaFace: Single-Shot Multi-Level Face Localisation in the Wild , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[11] Rainer Lienhart,et al. Reliable Transition Detection in Videos: A Survey and Practitioner's Guide , 2001, Int. J. Image Graph..
[12] Joon Son Chung,et al. Out of Time: Automated Lip Sync in the Wild , 2016, ACCV Workshops.
[13] Bernard Ghanem,et al. Active Speakers in Context , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[14] Larry S. Davis,et al. Look who's talking: speaker detection using video and audio correlation , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).
[15] Hervé Bourlard,et al. Audio-visual synchronisation for speaker diarisation , 2010, INTERSPEECH.
[16] Vittorio Murino,et al. RealVAD: A Real-World Dataset and A Method for Voice Activity Detection by Body Motion Analysis , 2021, IEEE Transactions on Multimedia.
[17] Adam Kirk,et al. Multimodal Active Speaker Detection and Virtual Cinematography for Video Conferencing , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[18] Joon Son Chung,et al. VoxCeleb2: Deep Speaker Recognition , 2018, INTERSPEECH.
[19] Tinne Tuytelaars,et al. Cross-Modal Supervision for Learning Active Speaker Detection in Video , 2016, ECCV.
[20] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[21] Andrew Zisserman,et al. LAEO-Net: Revisiting People Looking at Each Other in Videos , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[22] Kazuhito Koishida,et al. Improved Active Speaker Detection based on Optical Flow , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[23] Arkadiusz Stopczynski,et al. Ava Active Speaker: An Audio-Visual Dataset for Active Speaker Detection , 2019, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[24] Vittorio Murino,et al. S-VVAD: Visual Voice Activity Detection by Motion Segmentation , 2021, 2021 IEEE Winter Conference on Applications of Computer Vision (WACV).
[25] Frank Hutter,et al. Decoupled Weight Decay Regularization , 2017, ICLR.
[26] Francisco Madrigal,et al. Audio-Video detection of the active speaker in meetings , 2021, 2020 25th International Conference on Pattern Recognition (ICPR).
[27] Radu Horaud,et al. Active-speaker detection and localization with microphones and cameras embedded into a robotic head , 2013, 2013 13th IEEE-RAS International Conference on Humanoid Robots (Humanoids).
[28] Gregory Gelly,et al. Improving Speaker Diarization of TV Series using Talking-Face Detection and Clustering , 2016, ACM Multimedia.
[29] Andrew Zisserman,et al. Video Action Transformer Network , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[30] Dong Wang,et al. CN-Celeb: A Challenging Chinese Speaker Recognition Dataset , 2019, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[31] Malcolm Slaney,et al. Using audio-visual information to understand speaker activity: Tracking active speakers on and off screen , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[32] S. Shan,et al. Multi-Task Learning for Audio-Visual Active Speaker Detection , 2019 .
[33] Yong Xu,et al. Self-Supervised Learning for Audio-Visual Speaker Diarization , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[34] Joon Son Chung,et al. Spot the conversation: speaker diarisation in the wild , 2020, INTERSPEECH.
[35] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[36] Malcolm Slaney,et al. FaceSync: A Linear Operator for Measuring Synchronization of Video Facial Images and Audio Tracks , 2000, NIPS.
[37] Jean-Marc Odobez,et al. Investigating the use of visual focus of attention for audio-visual speaker diarisation , 2009, MM '09.
[38] A. Kendon. Some functions of gaze-direction in social interaction. , 1967, Acta psychologica.
[39] Zheng Shou,et al. Actor-Context-Actor Relation Network for Spatio-Temporal Action Localization , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[40] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[41] Joon Son Chung. Naver at ActivityNet Challenge 2019 - Task B Active Speaker Detection (AVA) , 2019, ArXiv.
[42] Zhanghui Kuang,et al. Context-Aware RCNN: A Baseline for Action Detection in Videos , 2020, ECCV.