Hierarchical Similarity Learning for Language-Based Product Image Retrieval
暂无分享,去创建一个
Shouling Ji | Jianfeng Dong | Yuan He | Xiaoye Qu | Zhe Ma | Fenghao Liu | S. Ji | Jianfeng Dong | Yuan He | Xiaoye Qu | Zhe Ma | Fenghao Liu
[1] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.
[2] Huchuan Lu,et al. Deep Cross-Modal Projection Learning for Image-Text Matching , 2018, ECCV.
[3] Xirong Li,et al. W2VV++: Fully Deep Learning for Ad-hoc Video Search , 2019, ACM Multimedia.
[4] Xirong Li,et al. Dual Encoding for Zero-Example Video Retrieval , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[5] Tat-Seng Chua,et al. Interpretable Fashion Matching with Rich Attributes , 2019, SIGIR.
[6] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[7] Samy Bengio,et al. Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[8] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[9] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[10] Mohit Bansal,et al. LXMERT: Learning Cross-Modality Encoder Representations from Transformers , 2019, EMNLP.
[11] Zhou Yu,et al. Deep Modular Co-Attention Networks for Visual Question Answering , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[12] Xi Chen,et al. Stacked Cross Attention for Image-Text Matching , 2018, ECCV.
[13] Jianfeng Dong,et al. Fine-Grained Fashion Similarity Learning by Attribute-Specific Embedding Network , 2020, AAAI.
[14] Tat-Seng Chua,et al. Tree-Augmented Cross-Modal Encoding for Complex-Query Video Retrieval , 2020, SIGIR.
[15] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[16] Yoshua Bengio,et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.
[17] Jianfeng Dong. Cross-media Relevance Computation for Multimedia Retrieval , 2017, ACM Multimedia.
[18] Xin Huang,et al. An Overview of Cross-Media Retrieval: Concepts, Methodologies, Benchmarks, and Challenges , 2017, IEEE Transactions on Circuits and Systems for Video Technology.
[19] Roger Levy,et al. A new approach to cross-modal multimedia retrieval , 2010, ACM Multimedia.
[20] Cho-Jui Hsieh,et al. VisualBERT: A Simple and Performant Baseline for Vision and Language , 2019, ArXiv.
[21] David J. Fleet,et al. VSE++: Improving Visual-Semantic Embeddings with Hard Negatives , 2017, BMVC.
[22] Stefan Lee,et al. ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks , 2019, NeurIPS.
[23] David A. Forsyth,et al. Learning Type-Aware Embeddings for Fashion Compatibility , 2018, ECCV.
[24] Shaogang Gong,et al. Image Search With Text Feedback by Visiolinguistic Attention Learning , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).