Audio-Visual Speaker Recognition with a Cross-Modal Discriminative Network
暂无分享,去创建一个
[1] Tae-Hyun Oh,et al. On Learning Associations of Faces and Voices , 2018, ACCV.
[2] Kevin Wilson,et al. Looking to listen at the cocktail party , 2018, ACM Trans. Graph..
[3] Xiaoyong Du,et al. Voice-Face Cross-modal Matching and Retrieval: A Benchmark , 2019, ArXiv.
[4] Stefanos Zafeiriou,et al. ArcFace: Additive Angular Margin Loss for Deep Face Recognition , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[5] E. Vatikiotis-Bateson,et al. `Putting the Face to the Voice' Matching Identity across Modality , 2003, Current Biology.
[6] Louis-Philippe Morency,et al. Multimodal Machine Learning: A Survey and Taxonomy , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[7] Lauren Mavica,et al. Matching voice and face identity from static images. , 2013, Journal of experimental psychology. Human perception and performance.
[8] Arif Mahmood,et al. Deep Latent Space Learning for Cross-Modal Mapping of Audio and Visual Signals , 2019, 2019 Digital Image Computing: Techniques and Applications (DICTA).
[9] Yu Qiao,et al. Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks , 2016, IEEE Signal Processing Letters.
[10] Elliot Singer,et al. The 2019 NIST Audio-Visual Speaker Recognition Evaluation , 2020 .
[11] Shuo Yang,et al. WIDER FACE: A Face Detection Benchmark , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[12] Tae-Hyun Oh,et al. Speech2Face: Learning the Face Behind a Voice , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[13] Abhishek Shrivastava,et al. SpeechMarker: A Voice Based Multi-Level Attendance Application , 2019, INTERSPEECH.
[14] Sanjeev Khudanpur,et al. X-Vectors: Robust DNN Embeddings for Speaker Recognition , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[15] Andrew Zisserman,et al. Learnable PINs: Cross-Modal Embeddings for Person Identity , 2018, ECCV.
[16] Stefanos Zafeiriou,et al. RetinaFace: Single-stage Dense Face Localisation in the Wild , 2019, ArXiv.
[17] Andreas Kleinschmidt,et al. Interaction of Face and Voice Areas during Speaker Recognition , 2005, Journal of Cognitive Neuroscience.
[18] Naoyuki Kanda,et al. Face-Voice Matching using Cross-modal Embeddings , 2018, ACM Multimedia.
[19] Sanjeev Khudanpur,et al. State-of-the-Art Speaker Recognition for Telephone and Video Speech: The JHU-MIT Submission for NIST SRE18 , 2019, INTERSPEECH.
[20] Joon Son Chung,et al. VoxCeleb2: Deep Speaker Recognition , 2018, INTERSPEECH.
[21] S. R. Mahadeva Prasanna,et al. Development of Multi-Level Speech based Person Authentication System , 2017, J. Signal Process. Syst..
[22] H. M. J. Smith,et al. Matching novel face and voice identity using static and dynamic facial images , 2016, Attention, perception & psychophysics.
[23] Yuxiao Hu,et al. MS-Celeb-1M: A Dataset and Benchmark for Large-Scale Face Recognition , 2016, ECCV.
[24] Niko Brümmer,et al. The BOSARIS Toolkit: Theory, Algorithms and Code for Surviving the New DCF , 2013, ArXiv.
[25] Douglas A. Reynolds,et al. Two decades of speaker recognition evaluation at the national institute of standards and technology , 2020, Comput. Speech Lang..
[26] Malcolm Slaney,et al. Putting a Face to the Voice: Fusing Audio and Visual Signals Across a Video to Determine Speakers , 2017, ArXiv.
[27] Bhiksha Raj,et al. Disjoint Mapping Network for Cross-modal Matching of Voices and Faces , 2018, ICLR.
[28] Paula C. Stacey,et al. Concordant Cues in Faces and Voices , 2016, Evolutionary Psychology.
[29] John H. L. Hansen,et al. I4U Submission to NIST SRE 2018: Leveraging from a Decade of Shared Experiences , 2019, INTERSPEECH.
[30] Bin Ma,et al. Joint Application of Speech and Speaker Recognition for Automation and Security in Smart Home , 2011, INTERSPEECH.
[31] Tetsuya Ogata,et al. Audio-visual speech recognition using deep learning , 2014, Applied Intelligence.
[32] Sanjeev Khudanpur,et al. Speaker Recognition for Multi-speaker Conversations Using X-vectors , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[33] Andrew Zisserman,et al. Seeing Voices and Hearing Faces: Cross-Modal Biometric Matching , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.