Crossmodal Sound Retrieval Based on Specific Target Co-Occurrence Denoted with Weak Labels
暂无分享,去创建一个
Masahiro Yasuda | Yasunori Ohishi | Noboru Harada | Yuma Koizumi | Yasunori Ohishi | N. Harada | Yuma Koizumi | Masahiro Yasuda
[1] Elizabeth S. Spelke,et al. Principles of Object Perception , 1990, Cogn. Sci..
[2] Chuang Gan,et al. The Sound of Pixels , 2018, ECCV.
[3] Fabio Viola,et al. The Kinetics Human Action Video Dataset , 2017, ArXiv.
[4] James R. Glass,et al. Towards Bilingual Lexicon Discovery From Visually Grounded Speech Audio , 2019, INTERSPEECH.
[5] Andrew Owens,et al. Visually Indicated Sounds , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[6] Andrew Owens,et al. Audio-Visual Scene Analysis with Self-Supervised Multisensory Features , 2018, ECCV.
[7] Kevin Wilson,et al. Looking to listen at the cocktail party , 2018, ACM Trans. Graph..
[8] Michael Picheny,et al. Grounding Spoken Words in Unlabeled Video , 2019, CVPR Workshops.
[9] Andrew Zisserman,et al. Objects that Sound , 2017, ECCV.
[10] Ryo Masumura,et al. Context-Aware Neural Voice Activity Detection Using Auxiliary Networks for Phoneme Recognition, Speech Enhancement and Acoustic Scene Classification , 2019, 2019 27th European Signal Processing Conference (EUSIPCO).
[11] Rogério Schmidt Feris,et al. Learning to Separate Object Sounds by Watching Unlabeled Video , 2018, ECCV.
[12] Yann LeCun,et al. A Closer Look at Spatiotemporal Convolutions for Action Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[13] Tae-Hyun Oh,et al. Learning to Localize Sound Source in Visual Scenes , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[14] Danna Zhou,et al. d. , 1934, Microbial pathogenesis.
[15] Karen Livescu,et al. Semantic Query-by-example Speech Search Using Visual Grounding , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[16] James Glass,et al. Learning Hierarchical Discrete Linguistic Units from Visually-Grounded Speech , 2020, ICLR.
[17] Kevin Barraclough,et al. I and i , 2001, BMJ : British Medical Journal.
[18] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[19] Antonio Torralba,et al. SoundNet: Learning Sound Representations from Unlabeled Video , 2016, NIPS.
[20] James R. Glass,et al. Jointly Discovering Visual Objects and Spoken Words from Raw Sensory Input , 2018, ECCV.
[21] Gabriel Ilharco,et al. Large-Scale Representation Learning from Visually Grounded Untranscribed Speech , 2019, CoNLL.
[22] David J. Fleet,et al. VSE++: Improving Visual-Semantic Embeddings with Hard Negatives , 2017, BMVC.
[23] Chuang Gan,et al. Self-supervised Audio-visual Co-segmentation , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[24] Suh-Yin Lee,et al. Background music recommendation for video based on multimodal latent semantic analysis , 2013, 2013 IEEE International Conference on Multimedia and Expo (ICME).
[25] Yuuki Tachioka. Dnn-Based Voice Activity Detection Using Auxiliary Speech Models in Noisy Environments , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[26] Emmanuel Dupoux,et al. Cognitive science in the era of artificial intelligence: A roadmap for reverse-engineering the infant language-learner , 2016, Cognition.
[27] Chuang Gan,et al. The Sound of Motions , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[28] Xinbo Gao,et al. Triplet-Based Deep Hashing Network for Cross-Modal Retrieval , 2018, IEEE Transactions on Image Processing.
[29] Kunio Kashino,et al. Trilingual Semantic Embeddings of Visually Grounded Speech with Self-Attention Mechanisms , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[30] Andrew Owens,et al. Ambient Sound Provides Supervision for Visual Learning , 2016, ECCV.
[31] Aren Jansen,et al. Unsupervised Learning of Semantic Audio Representations , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[32] Justin Salamon,et al. Look, Listen, and Learn More: Design Choices for Deep Audio Embeddings , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[33] W. Marsden. I and J , 2012 .
[34] Yong Xu,et al. Self-Supervised Learning for Audio-Visual Speaker Diarization , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).