暂无分享,去创建一个
Tae-Hyun Oh | Wojciech Matusik | Changil Kim | Alexandre Kaspar | Mohamed A. Elgharib | Hijung Shin | Hijung Valentina Shin | W. Matusik | Tae-Hyun Oh | Changil Kim | Hijung Shin | Alexandre Kaspar
[1] B. Kabanoff,et al. Eye movements in auditory space perception , 1975 .
[2] H. McGurk,et al. Hearing lips and seeing voices , 1976, Nature.
[3] B. R. Shelton,et al. The influence of vision on the absolute identification of sound-source position , 1980, Perception & psychophysics.
[4] Kurt Hornik,et al. Approximation capabilities of multilayer feedforward networks , 1991, Neural Networks.
[5] William W. Gaver. What in the World Do We Hear? An Ecological Approach to Auditory Event Perception , 1993 .
[6] D. Lewkowicz,et al. Three‐month‐old infants learn arbitrary auditory–visual pairings between voices and faces , 2001 .
[7] B. Holden. Listen and learn , 2002 .
[8] E. Vatikiotis-Bateson,et al. `Putting the Face to the Voice' Matching Identity across Modality , 2003, Current Biology.
[9] D. Pisoni,et al. Crossmodal Source Identification in Speech Perception , 2004, Ecological psychology : a publication of the International Society for Ecological Psychology.
[10] Andreas Kleinschmidt,et al. Interaction of Face and Voice Areas during Speaker Recognition , 2005, Journal of Cognitive Neuroscience.
[11] Yann LeCun,et al. Learning a similarity metric discriminatively, with application to face verification , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).
[12] S. Campanella,et al. Integrating face and voice in person perception , 2007, Trends in Cognitive Sciences.
[13] Geoffrey E. Hinton,et al. Visualizing Data using t-SNE , 2008 .
[14] Simon Lucey,et al. Face alignment through subspace constrained mean-shifts , 2009, 2009 IEEE 12th International Conference on Computer Vision.
[15] Alexei A. Efros,et al. Unbiased look at dataset bias , 2011, CVPR 2011.
[16] O. Pascalis,et al. Spontaneous voice–face identity matching by rhesus monkeys for familiar conspecifics and humans , 2011, Proceedings of the National Academy of Sciences.
[17] Juhan Nam,et al. Multimodal Deep Learning , 2011, ICML.
[18] S. Campanella,et al. Cross-modal interactions between human faces and voices involved in person recognition , 2011, Cortex.
[19] Lauren Mavica,et al. Matching voice and face identity from static images. , 2013, Journal of experimental psychology. Human perception and performance.
[20] S. Campanella,et al. Integrating face and voice in person perception , 2007, Trends in Cognitive Sciences.
[21] Andrew Zisserman,et al. Deep Face Recognition , 2015, BMVC.
[22] Nir Ailon,et al. Deep Metric Learning Using Triplet Network , 2014, SIMBAD.
[23] Xiaogang Wang,et al. Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).
[24] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[25] M. Grabowecky,et al. Learned face–voice pairings facilitate visual search , 2015, Psychonomic bulletin & review.
[26] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[27] Radu Horaud,et al. Tracking the Active Speaker Based on a Joint Audio-Visual Observation Model , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).
[28] Alexei A. Efros,et al. Unsupervised Visual Representation Learning by Context Prediction , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[29] Leonard S. Peperkoorn,et al. Revisiting the Red Effect on Attractiveness and Sexual Receptivity: No Effect of the Color Red on Human Mate Preferences , 2016, Evolutionary Psychology.
[30] Jean Charles Bazin,et al. Suggesting Sounds for Images from Video Collections , 2016, ECCV Workshops.
[31] Paula C. Stacey,et al. Concordant Cues in Faces and Voices , 2016 .
[32] Sanja Fidler,et al. Order-Embeddings of Images and Language , 2015, ICLR.
[33] Antonio Torralba,et al. SoundNet: Learning Sound Representations from Unlabeled Video , 2016, NIPS.
[34] Chen Huang,et al. Local Similarity-Aware Deep Feature Embedding , 2016, NIPS.
[35] H. M. J. Smith,et al. Matching novel face and voice identity using static and dynamic facial images , 2016, Attention, perception & psychophysics.
[36] Antonio Torralba,et al. Anticipating Visual Representations from Unlabeled Video , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[37] Andrew Owens,et al. Ambient Sound Provides Supervision for Visual Learning , 2016, ECCV.
[38] Andrew Owens,et al. Visually Indicated Sounds , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[39] Victor S. Lempitsky,et al. Learning Deep Embeddings with Histogram Loss , 2016, NIPS.
[40] Trevor Darrell,et al. Adversarial Discriminative Domain Adaptation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[41] Kaiqi Huang,et al. Beyond Triplet Loss: A Deep Quadruplet Network for Person Re-identification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[42] Bolei Zhou,et al. Network Dissection: Quantifying Interpretability of Deep Visual Representations , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[43] Malcolm Slaney,et al. Putting a Face to the Voice: Fusing Audio and Visual Signals Across a Video to Determine Speakers , 2017, ArXiv.
[44] Ira Kemelmacher-Shlizerman,et al. Synthesizing Obama , 2017, ACM Trans. Graph..
[45] Joon Son Chung,et al. VoxCeleb: A Large-Scale Speaker Identification Dataset , 2017, INTERSPEECH.
[46] Jaakko Lehtinen,et al. Audio-driven facial animation by joint end-to-end learning of pose and emotion , 2017, ACM Trans. Graph..
[47] Andrew Zisserman,et al. Look, Listen and Learn , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[48] Joon Son Chung,et al. Lip Reading Sentences in the Wild , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[49] Yisong Yue,et al. A deep learning approach for generalized speech animation , 2017, ACM Trans. Graph..
[50] Larry S. Davis,et al. Deception Detection in Videos , 2017, AAAI.
[51] Andrew Zisserman,et al. Seeing Voices and Hearing Faces: Cross-Modal Biometric Matching , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[52] Tae-Hyun Oh,et al. Learning to Localize Sound Source in Visual Scenes , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.