Grounding Spoken Words in Unlabeled Video
暂无分享,去创建一个
Michael Picheny | James R. Glass | Antonio Torralba | Dhiraj Joshi | Samuel Thomas | Dan Gutfreund | David Harwath | Kartik Audhkhasi | Rogerio Feris | Yang Zhang | James Glass | Angie Boggust | A. Torralba | M. Picheny | R. Feris | D. Joshi | Yang Zhang | David F. Harwath | Samuel Thomas | Kartik Audhkhasi | Angie Boggust | Dan Gutfreund
[1] Aren Jansen,et al. Unsupervised Learning of Semantic Audio Representations , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[2] Grzegorz Chrupala,et al. Representations of language in a model of visually grounded speech signal , 2017, ACL.
[3] James R. Glass,et al. Unsupervised Learning of Spoken Language with Visual Context , 2016, NIPS.
[4] Rogério Schmidt Feris,et al. Learning to Separate Object Sounds by Watching Unlabeled Video , 2018, ECCV.
[5] Apostol Natsev,et al. YouTube-8M: A Large-Scale Video Classification Benchmark , 2016, ArXiv.
[6] Andrew Owens,et al. Ambient Sound Provides Supervision for Visual Learning , 2016, ECCV.
[7] Nuno Vasconcelos,et al. Self-Supervised Generation of Spatial Audio for 360 Video , 2018, NIPS 2018.
[8] Chuang Gan,et al. The Sound of Pixels , 2018, ECCV.
[9] Chenliang Xu,et al. Towards Automatic Learning of Procedures From Web Instructional Videos , 2017, AAAI.
[10] Andrew Zisserman,et al. Look, Listen and Learn , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[11] James R. Glass,et al. Jointly Discovering Visual Objects and Spoken Words from Raw Sensory Input , 2018, ECCV.
[12] Antonio Torralba,et al. SoundNet: Learning Sound Representations from Unlabeled Video , 2016, NIPS.
[13] Andrew Owens,et al. Audio-Visual Scene Analysis with Self-Supervised Multisensory Features , 2018, ECCV.
[14] Andrew Zisserman,et al. Seeing Voices and Hearing Faces: Cross-Modal Biometric Matching , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[15] Andrew Zisserman,et al. Objects that Sound , 2017, ECCV.
[16] Chen Fang,et al. Visual to Sound: Generating Natural Sound for Videos in the Wild , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[17] Samy Bengio,et al. Large Scale Online Learning of Image Similarity through Ranking , 2009, IbPRIA.
[18] Andrew Owens,et al. Visually Indicated Sounds , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[19] Tae-Hyun Oh,et al. On Learning Associations of Faces and Voices , 2018, ACCV.
[20] Gregory Shakhnarovich,et al. Visually Grounded Learning of Keyword Prediction from Untranscribed Speech , 2017, INTERSPEECH.