暂无分享,去创建一个
Muhammad Haroon Yousaf | Alessio Del Bue | Ignazio Gallo | Arif Mahmood | Muhammad Saad Saeed | Shah Nawaz | Pietro Morerio | A. D. Bue | I. Gallo | A. Mahmood | M. Yousaf | Pietro Morerio | M. S. Saeed | Shah Nawaz
[1] Andrew Zisserman,et al. Learnable PINs: Cross-Modal Embeddings for Person Identity , 2018, ECCV.
[2] Arif Mahmood,et al. Do Cross Modal Systems Leverage Semantic Relationships? , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).
[3] Joon Son Chung,et al. Utterance-level Aggregation for Speaker Recognition in the Wild , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[4] Ke Chen,et al. Exploring speaker-specific characteristics with deep learning , 2011, The 2011 International Joint Conference on Neural Networks.
[5] Hans-Peter Kriegel,et al. A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.
[6] John H. L. Hansen,et al. Spoken language mismatch in speaker verification: An investigation with NIST-SRE and CRSS Bi-Ling corpora , 2014, 2014 IEEE Spoken Language Technology Workshop (SLT).
[7] Omkar M. Parkhi,et al. VGGFace2: A Dataset for Recognising Faces across Pose and Age , 2017, 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018).
[8] Liang Lu,et al. The effect of language factors for robust speaker recognition , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.
[9] Joon Son Chung,et al. Out of Time: Automated Lip Sync in the Wild , 2016, ACCV Workshops.
[10] Kevin Walker,et al. Call My Net Corpus: A Multilingual Corpus for Evaluation of Speaker Recognition Technology , 2017, INTERSPEECH.
[11] Margaret Mitchell,et al. VQA: Visual Question Answering , 2015, International Journal of Computer Vision.
[12] Tae-Hyun Oh,et al. On Learning Associations of Faces and Voices , 2018, ACCV.
[13] Roland Auckenthaler,et al. Language dependency in text-independent speaker verification , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).
[14] Antonio Torralba,et al. SoundNet: Learning Sound Representations from Unlabeled Video , 2016, NIPS.
[15] Lei Zhang,et al. Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[16] Aaron Lawson,et al. The Speakers in the Wild (SITW) Speaker Recognition Database , 2016, INTERSPEECH.
[17] David Miller,et al. The Mixer Corpus of Multilingual, Multichannel Speaker Recognition Data , 2004, LREC.
[18] Andrew Zisserman,et al. Seeing Voices and Hearing Faces: Cross-Modal Biometric Matching , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[19] Huizhong Chen,et al. Residual Enhanced Visual Vectors for on-device image matching , 2011, 2011 Conference Record of the Forty Fifth Asilomar Conference on Signals, Systems and Computers (ASILOMAR).
[20] Bhiksha Raj,et al. Disjoint Mapping Network for Cross-modal Matching of Voices and Faces , 2018, ICLR.
[21] John D E Gabrieli,et al. Human Voice Recognition Depends on Language Ability , 2011, Science.
[22] H. Shimodaira,et al. Improving predictive inference under covariate shift by weighting the log-likelihood function , 2000 .
[23] Patrick Kenny,et al. Joint Factor Analysis of Speaker and Session Variability: Theory and Algorithms , 2006 .
[24] Yoshua Bengio,et al. Speaker Recognition from Raw Waveform with SincNet , 2018, 2018 IEEE Spoken Language Technology Workshop (SLT).
[25] Qingming Huang,et al. Seeking the Shape of Sound: An Adaptive Framework for Learning Voice-Face Association , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[26] Lukás Burget,et al. Support vector machines and Joint Factor Analysis for speaker verification , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.
[27] Yu Qiao,et al. A Discriminative Feature Learning Approach for Deep Face Recognition , 2016, ECCV.
[28] Arif Mahmood,et al. Deep Latent Space Learning for Cross-Modal Mapping of Audio and Visual Signals , 2019, 2019 Digital Image Computing: Techniques and Applications (DICTA).
[29] Yun Lei,et al. A novel scheme for speaker recognition using a phonetically-aware deep neural network , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[30] Xuanjing Huang,et al. Adaptive Co-attention Network for Named Entity Recognition in Tweets , 2018, AAAI.
[31] Daniel P. W. Ellis,et al. Tandem acoustic modeling in large-vocabulary recognition , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).
[32] Andrew Zisserman,et al. Deep Face Recognition , 2015, BMVC.
[33] Jing Huang,et al. Audio-visual deep learning for noise robust speech recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[34] Yin Li,et al. Learning Deep Structure-Preserving Image-Text Embeddings , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[35] Sergey Ioffe,et al. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.
[36] Andrew Zisserman,et al. Return of the Devil in the Details: Delving Deep into Convolutional Nets , 2014, BMVC.
[37] Joon Son Chung,et al. VoxCeleb2: Deep Speaker Recognition , 2018, INTERSPEECH.
[38] Juhan Nam,et al. Multimodal Deep Learning , 2011, ICML.
[39] Nitish Srivastava,et al. Multimodal learning with deep Boltzmann machines , 2012, J. Mach. Learn. Res..
[40] Ignazio Gallo,et al. Multimodal Classification Fusion in Real-World Scenarios , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).
[41] Samy Bengio,et al. Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[42] Joon Son Chung,et al. Voxceleb: Large-scale speaker verification in the wild , 2020, Comput. Speech Lang..
[43] Francis M. Tyers,et al. Common Voice: A Massively-Multilingual Speech Corpus , 2020, LREC.
[44] Fei-Fei Li,et al. Deep visual-semantic alignments for generating image descriptions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[45] S. Pruzansky. Pattern‐Matching Procedure for Automatic Talker Recognition , 1963 .
[46] Yan Yan,et al. Dual Attention Matching for Audio-Visual Event Localization , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[47] Andrew Zisserman,et al. Emotion Recognition in Speech using Cross-Modal Transfer in the Wild , 2018, ACM Multimedia.
[48] Naoyuki Kanda,et al. Face-Voice Matching using Cross-modal Embeddings , 2018, ACM Multimedia.
[49] Ignazio Gallo,et al. Git Loss for Deep Face Recognition , 2018, BMVC.
[50] Yu Qiao,et al. Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks , 2016, IEEE Signal Processing Letters.
[51] Ignazio Gallo,et al. Aiding Intra-Text Representations with Visual Context for Multimodal Named Entity Recognition , 2019, 2019 International Conference on Document Analysis and Recognition (ICDAR).
[52] Douglas A. Reynolds,et al. Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..
[53] Tomas Mikolov,et al. Efficient Large-Scale Multi-Modal Classification , 2018, AAAI.
[54] Wolfgang Wahlster,et al. Verbmobil: Foundations of Speech-to-Speech Translation , 2000, Artificial Intelligence.
[55] Florian Schiel,et al. Verbmobil Data Collection and Annotation , 2000 .
[56] Joon Son Chung,et al. VoxCeleb: A Large-Scale Speaker Identification Dataset , 2017, INTERSPEECH.