Audio-Visual Deep Neural Network for Robust Person Verification
暂无分享,去创建一个
[1] Shuai Wang,et al. Generative Adversarial Networks based X-vector Augmentation for Robust Probabilistic Linear Discriminant Analysis in Speaker Verification , 2018, 2018 11th International Symposium on Chinese Spoken Language Processing (ISCSLP).
[2] Trevor Darrell,et al. Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding , 2016, EMNLP.
[3] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[4] Ming Yang,et al. DeepFace: Closing the Gap to Human-Level Performance in Face Verification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[5] Charles C. Broun,et al. Using lip features for multimodal speaker verification , 2001, Odyssey.
[6] Kai Yu,et al. Discriminative Neural Embedding Learning for Short-Duration Text-Independent Speaker Verification , 2019, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[7] James Philbin,et al. FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[8] Naoyuki Kanda,et al. Face-Voice Matching using Cross-modal Embeddings , 2018, ACM Multimedia.
[9] Chenda Li,et al. Deep Audio-Visual Speech Separation with Attention Mechanism , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[10] Tae-Hyun Oh,et al. Noise-tolerant Audio-visual Online Person Verification Using an Attention-based Neural Network Fusion , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[11] Aren Jansen,et al. Audio Set: An ontology and human-labeled dataset for audio events , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[12] Fabio A. González,et al. Gated Multimodal Units for Information Fusion , 2017, ICLR.
[13] Yanmin Qian,et al. Listen, Watch and Understand at the Cocktail Party: Audio-Visual-Contextual Speech Separation , 2020, INTERSPEECH.
[14] Xiaogang Wang,et al. Deep Learning Face Representation from Predicting 10,000 Classes , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[15] Erik McDermott,et al. Deep neural networks for small footprint text-dependent speaker verification , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[16] Stefanos Zafeiriou,et al. ArcFace: Additive Angular Margin Loss for Deep Face Recognition , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[17] Tae-Hyun Oh,et al. On Learning Associations of Faces and Voices , 2018, ACCV.
[18] Marwan Mattar,et al. Labeled Faces in the Wild: A Database forStudying Face Recognition in Unconstrained Environments , 2008 .
[19] Sanjeev Khudanpur,et al. X-Vectors: Robust DNN Embeddings for Speaker Recognition , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[20] Yu Qiao,et al. Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks , 2016, IEEE Signal Processing Letters.
[21] Andrew Zisserman,et al. Seeing Voices and Hearing Faces: Cross-Modal Biometric Matching , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[22] Yann LeCun,et al. Dimensionality Reduction by Learning an Invariant Mapping , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).
[23] Themos Stafylakis,et al. Combining Residual Networks with LSTMs for Lipreading , 2017, INTERSPEECH.
[24] Douglas A. Reynolds,et al. Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..
[25] Kevin Wilson,et al. Looking to listen at the cocktail party , 2018, ACM Trans. Graph..
[26] Matti Pietikäinen,et al. Learning Discriminant Face Descriptor , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[27] Shuai Wang,et al. BUT System Description to VoxCeleb Speaker Recognition Challenge 2019 , 2019, ArXiv.
[28] Maja Pantic,et al. End-to-End Audiovisual Speech Recognition , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[29] Sanjeev Khudanpur,et al. Deep Neural Network Embeddings for Text-Independent Speaker Verification , 2017, INTERSPEECH.
[30] Yanmin Qian,et al. Data Augmentation Using Deep Generative Models for Embedding Based Speaker Recognition , 2020, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[31] Xiaogang Wang,et al. Deep Learning Face Representation by Joint Identification-Verification , 2014, NIPS.
[32] Taghi M. Khoshgoftaar,et al. A survey on Image Data Augmentation for Deep Learning , 2019, Journal of Big Data.
[33] Muhammad Haroon Yousaf,et al. Cross-modal Speaker Verification and Recognition: A Multilingual Perspective , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).
[34] Joon Son Chung,et al. VoxCeleb: A Large-Scale Speaker Identification Dataset , 2017, INTERSPEECH.
[35] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[36] Ya Zhang,et al. Deep feature for text-dependent speaker verification , 2015, Speech Commun..
[37] Enhua Wu,et al. Squeeze-and-Excitation Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[38] Joon Son Chung,et al. VoxCeleb2: Deep Speaker Recognition , 2018, INTERSPEECH.
[39] Patrick Kenny,et al. Front-End Factor Analysis for Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.
[40] Shuai Wang,et al. Multi-Modality Matters: A Performance Leap on VoxCeleb , 2020, INTERSPEECH.
[41] Daniel Povey,et al. The Kaldi Speech Recognition Toolkit , 2011 .
[42] Shuai Wang,et al. Margin Matters: Towards More Discriminative Deep Neural Network Embeddings for Speaker Recognition , 2019, 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC).
[43] Joon Son Chung,et al. Deep Audio-Visual Speech Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[44] Chunlei Zhang,et al. End-to-End Text-Independent Speaker Verification with Triplet Loss on Short Utterances , 2017, INTERSPEECH.
[45] Andrew Zisserman,et al. Deep Face Recognition , 2015, BMVC.
[46] Themos Stafylakis,et al. Analysis of ABC Submission to NIST SRE 2019 CMN and VAST Challenge , 2020 .
[47] Xiaogang Wang,et al. Deeply learned face representations are sparse, selective, and robust , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[48] Kevin Duh,et al. Audio-Visual Person Recognition in Multimedia Data From the Iarpa Janus Program , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[49] Xiaogang Wang,et al. DeepID3: Face Recognition with Very Deep Neural Networks , 2015, ArXiv.
[50] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[51] Jian Sun,et al. Face recognition with learning-based descriptor , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.
[52] Elliot Singer,et al. The 2019 NIST Audio-Visual Speaker Recognition Evaluation , 2020 .
[53] Daniel Povey,et al. MUSAN: A Music, Speech, and Noise Corpus , 2015, ArXiv.