Babel-ImageNet: Massively Multilingual Evaluation of Vision-and-Language Representations
暂无分享,去创建一个
[1] Alexander Kolesnikov,et al. Sigmoid Loss for Language Image Pre-Training , 2023, ArXiv.
[2] S. Savarese,et al. BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models , 2023, ICML.
[3] Alexander W. Fang,et al. Does progress on ImageNet transfer to real-world datasets? , 2023, Neural Information Processing Systems.
[4] Ledell Yu Wu,et al. AltCLIP: Altering the Language Encoder in CLIP for Extended Language Capabilities , 2022, ACL.
[5] Jingren Zhou,et al. Chinese CLIP: Contrastive Vision-Language Pretraining in Chinese , 2022, ArXiv.
[6] Ludwig Schmidt,et al. LAION-5B: An open large-scale dataset for training next generation image-text models , 2022, NeurIPS.
[7] Shannon L. Spruit,et al. No Language Left Behind: Scaling Human-Centered Machine Translation , 2022, ArXiv.
[8] Wangchunshu Zhou,et al. Cross-View Language Modeling: Towards Unified Cross-Lingual Cross-Modal Pre-training , 2022, ACL.
[9] Liang Zhang,et al. Generalizing Multimodal Pre-training into Multilingual via Language Acquisition , 2022, ArXiv.
[10] Ashish V. Thapliyal,et al. Crossmodal-3600: A Massively Multilingual Multimodal Evaluation Dataset , 2022, EMNLP.
[11] S. Hoi,et al. BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation , 2022, ICML.
[12] Siva Reddy,et al. IGLUE: A Benchmark for Transfer Learning across Modalities, Tasks, and Languages , 2022, ICML.
[13] B. Ommer,et al. High-Resolution Image Synthesis with Latent Diffusion Models , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[14] A. Frank,et al. MAGMA - Multimodal Augmentation of Generative Models through Adapter-based Finetuning , 2021, EMNLP.
[15] Quoc V. Le,et al. Combined Scaling for Zero-shot Transfer Learning , 2021, Neurocomputing.
[16] Daniel Keysers,et al. LiT: Zero-Shot Transfer with Locked-image text Tuning , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[17] Jenia Jitsev,et al. LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs , 2021, ArXiv.
[18] Nigel Collier,et al. Visually Grounded Reasoning across Languages and Cultures , 2021, EMNLP.
[19] Jan-Martin O. Steitz,et al. xGQA: Cross-Lingual Visual Question Answering , 2021, FINDINGS.
[20] Silvia Terragni,et al. Contrastive Language-Image Pre-training for the Italian Language , 2021, CLiC-it.
[21] Yelong Shen,et al. LoRA: Low-Rank Adaptation of Large Language Models , 2021, ICLR.
[22] Danqi Chen,et al. SimCSE: Simple Contrastive Learning of Sentence Embeddings , 2021, EMNLP.
[23] Jingjing Liu,et al. UC2: Universal Cross-lingual Cross-modal Vision-and-Language Pre-training , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[24] Jonas Mueller,et al. Pervasive Label Errors in Test Sets Destabilize Machine Learning Benchmarks , 2021, NeurIPS Datasets and Benchmarks.
[25] Nils Reimers,et al. Retrieve Fast, Rerank Smart: Cooperative and Joint Approaches for Improved Cross-Modal Retrieval , 2021, TACL.
[26] Jiecao Chen,et al. WIT: Wikipedia-based Image Text Dataset for Multimodal Multilingual Machine Learning , 2021, SIGIR.
[27] Ilya Sutskever,et al. Learning Transferable Visual Models From Natural Language Supervision , 2021, ICML.
[28] Quoc V. Le,et al. Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision , 2021, ICML.
[29] Pranav Aggarwal,et al. Towards Zero-shot Cross-lingual Image Retrieval , 2020, ArXiv.
[30] Goran Glavaš,et al. From Zero to Hero: On the Limitations of Zero-Shot Language Transfer with Multilingual Transformers , 2020, EMNLP.
[31] S. Gelly,et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.
[32] Iryna Gurevych,et al. AdapterHub: A Framework for Adapting Transformers , 2020, EMNLP.
[33] Jianfeng Gao,et al. M3P: Learning Universal Representations via Multitask Multilingual Multimodal Pre-training , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[34] Iryna Gurevych,et al. MAD-X: An Adapter-based Framework for Multi-task Cross-lingual Transfer , 2020, EMNLP.
[35] Iryna Gurevych,et al. Making Monolingual Sentence Embeddings Multilingual Using Knowledge Distillation , 2020, EMNLP.
[36] Bryan A. Plummer,et al. Learning to Scale Multilingual Representations for Vision-Language Tasks , 2020, ECCV.
[37] Orhan Firat,et al. XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization , 2020, ICML.
[38] Myle Ott,et al. Unsupervised Cross-lingual Representation Learning at Scale , 2019, ACL.
[39] Jonatas Wehrmann,et al. Language-Agnostic Visual-Semantic Embeddings , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[40] Bryan A. Plummer,et al. MULE: Multimodal Universal Language Embedding , 2019, AAAI.
[41] Holger Schwenk,et al. WikiMatrix: Mining 135M Parallel Sentences in 1620 Language Pairs from Wikipedia , 2019, EACL.
[42] Laurens van der Maaten,et al. Does Object Recognition Work for Everyone? , 2019, CVPR Workshops.
[43] Benjamin Recht,et al. Do ImageNet Classifiers Generalize to ImageNet? , 2019, ICML.
[44] Mona Attariyan,et al. Parameter-Efficient Transfer Learning for NLP , 2019, ICML.
[45] Desmond Elliott,et al. Findings of the Third Shared Task on Multimodal Machine Translation , 2018, WMT.
[46] Xirong Li,et al. COCO-CN for Cross-Lingual Image Tagging, Captioning, and Retrieval , 2018, IEEE Transactions on Multimedia.
[47] Finn Årup Nielsen,et al. Linking ImageNet WordNet Synsets with Wikidata , 2018, WWW.
[48] D. Sculley,et al. No Classification without Representation: Assessing Geodiversity Issues in Open Data Sets for the Developing World , 2017, 1711.08536.
[49] Frank Hutter,et al. Decoupled Weight Decay Regularization , 2017, ICLR.
[50] Desmond Elliott,et al. Findings of the Second Shared Task on Multimodal Machine Translation and Multilingual Image Description , 2017, WMT.
[51] Frank Keller,et al. Image Pivoting for Learning Multilingual Multimodal Representations , 2017, EMNLP.
[52] David J. Fleet,et al. VSE++: Improving Visual-Semantic Embeddings with Hard Negatives , 2017, BMVC.
[53] Akikazu Takeuchi,et al. STAIR Captions: Constructing a Large-Scale Japanese Image Caption Dataset , 2017, ACL.
[54] Khalil Sima'an,et al. Multi30K: Multilingual English-German Image Descriptions , 2016, VL@ACL.
[55] Svetlana Lazebnik,et al. Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models , 2015, International Journal of Computer Vision.
[56] Fei-Fei Li,et al. Deep visual-semantic alignments for generating image descriptions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[57] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.
[58] C. V. Jawahar,et al. Cats and dogs , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.
[59] Denny Vrandecic,et al. Wikidata: a new platform for collaborative data collection , 2012, WWW.
[60] Simone Paolo Ponzetto,et al. BabelNet: Building a Very Large Multilingual Semantic Network , 2010, ACL.
[61] Fei-Fei Li,et al. ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.
[62] Andrew Zisserman,et al. Automated Flower Classification over a Large Number of Classes , 2008, 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing.
[63] George A. Miller,et al. WordNet: A Lexical Database for English , 1995, HLT.
[64] F. Carlsson,et al. Cross-lingual and Multilingual CLIP , 2022, LREC.
[65] Jason Baldridge,et al. MURAL: Multimodal, Multitask Representations Across Languages , 2021, EMNLP.
[66] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[67] Helmut Feldweg,et al. GermaNet - a Lexical-Semantic Net for German , 1997 .