FashionKLIP: Enhancing E-Commerce Image-Text Retrieval with Fashion Multi-Modal Conceptual Knowledge Graph
暂无分享,去创建一个
Lei Li | Zhixu Li | Yanghua Xiao | Linbo Jin | Chengyu Wang | Jun Huang | Xiaodan Wang | Ben Chen | Ming Gao
[1] Xiatian Zhu,et al. FashionViL: Fashion-Focused Vision-and-Language Representation Learning , 2022, ECCV.
[2] Minghui Qiu,et al. EasyNLP: A Comprehensive and Easy-to-use Toolkit for Natural Language Processing , 2022, EMNLP.
[3] Tamara L. Berg,et al. CommerceMM: Large-Scale Commerce MultiModal Representation Learning with Omni Retrieval , 2022, KDD.
[4] Hang Li,et al. Multi-Grained Vision Language Pre-Training: Aligning Texts with Visual Concepts , 2021, ICML.
[5] Zhenguo Li,et al. FILIP: Fine-grained Interactive Language-Image Pre-Training , 2021, ICLR.
[6] Huajun Chen,et al. Knowledge Perceived Multi-modal Pretraining in E-commerce , 2021, ACM Multimedia.
[7] Zhou Yu,et al. ROSITA: Enhancing Vision-and-Language Semantic Alignments via Cross- and Intra-modal Knowledge Integration , 2021, ACM Multimedia.
[8] Ling Shao,et al. Kaleido-BERT: Vision-Language Pre-training on Fashion Domain , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[9] Ilya Sutskever,et al. Learning Transferable Visual Models From Natural Language Supervision , 2021, ICML.
[10] Quoc V. Le,et al. Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision , 2021, ICML.
[11] Hua Wu,et al. UNIMO: Towards Unified-Modal Understanding and Generation via Cross-Modal Contrastive Learning , 2020, ACL.
[12] S. Gelly,et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.
[13] Hao Tian,et al. ERNIE-ViL: Knowledge Enhanced Vision-Language Representations Through Scene Graph , 2020, AAAI.
[14] Yu Cheng,et al. Large-Scale Adversarial Training for Vision-and-Language Representation Learning , 2020, NeurIPS.
[15] Hao Wang,et al. FashionBERT: Text and Image Matching with Adaptive Loss for Cross-modal Retrieval , 2020, SIGIR.
[16] Jianfeng Gao,et al. Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks , 2020, ECCV.
[17] Yu Cheng,et al. UNITER: UNiversal Image-TExt Representation Learning , 2019, ECCV.
[18] Ming-Wei Chang,et al. Well-Read Students Learn Better: On the Importance of Pre-training Compact Models , 2019 .
[19] Mohit Bansal,et al. LXMERT: Learning Cross-Modality Encoder Representations from Transformers , 2019, EMNLP.
[20] Nan Duan,et al. Unicoder-VL: A Universal Encoder for Vision and Language by Cross-modal Pre-training , 2019, AAAI.
[21] Radu Soricut,et al. Conceptual Captions: A Cleaned, Hypernymed, Image Alt-text Dataset For Automatic Image Captioning , 2018, ACL.
[22] Ying Zhang,et al. Fashion-Gen: The Generative Fashion Dataset and Challenge , 2018, ArXiv.
[23] David J. Fleet,et al. VSE++: Improving Visual-Semantic Embeddings with Hard Negatives , 2017, BMVC.
[24] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[25] Jeff Johnson,et al. Billion-Scale Similarity Search with GPUs , 2017, IEEE Transactions on Big Data.
[26] Fei-Fei Li,et al. Deep visual-semantic alignments for generating image descriptions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[27] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.
[28] Peter Young,et al. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions , 2014, TACL.
[29] Xin Jiang,et al. Wukong: 100 Million Large-scale Chinese Cross-modal Pre-training Dataset and A Foundation Framework , 2022, ArXiv.
[30] Jeff Z. Pan,et al. Construction and Applications of Open Business Knowledge Graph , 2022 .
[31] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .
[32] Jade Goldstein-Stewart,et al. The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.