Interpreting Vision and Language Generative Models with Semantic Visual Priors
暂无分享,去创建一个
[1] Ross B. Girshick,et al. Segment Anything , 2023, 2023 IEEE/CVF International Conference on Computer Vision (ICCV).
[2] Kees van Deemter,et al. HL Dataset: Grounding High-Level Linguistic Concepts in Vision , 2023, ArXiv.
[3] C. Seifert,et al. From Anecdotal Evidence to Quantitative Evaluation Methods: A Systematic Review on Evaluating Explainable AI , 2022, ACM Comput. Surv..
[4] Jeremias Sulam,et al. Fast Hierarchical Games for Image Explanations , 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[5] A. Frank,et al. MM-SHAP: A Performance-agnostic Metric for Measuring Multimodal Contributions in Vision and Language Models & Tasks , 2022, ArXiv.
[6] Been Kim,et al. Post hoc Explanations may be Ineffective for Detecting Unknown Spurious Correlation , 2022, ICLR.
[7] Tristan Thrush,et al. Winoground: Probing Vision and Language Models for Visio-Linguistic Compositionality , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[8] Liqiang Nie,et al. Image-text Retrieval: A Survey on Recent Research and Development , 2022, IJCAI.
[9] W. Freeman,et al. Unsupervised Semantic Segmentation by Distilling Feature Correspondences , 2022, ICLR.
[10] Deyu Li,et al. Attention-based explainable friend link prediction with heterogeneous context information , 2022, Inf. Sci..
[11] Jingren Zhou,et al. OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework , 2022, ICML.
[12] S. Hoi,et al. BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation , 2022, ICML.
[13] Anette Frank,et al. VALSE: A Task-Independent Benchmark for Vision and Language Models Centered on Linguistic Phenomena , 2021, ACL.
[14] Marcella Cornia,et al. Explaining transformer-based image captioning models: An empirical analysis , 2021, AI Commun..
[15] E. Mosca,et al. SHAP-Based Explanation Methods: A Review for NLP Interpretability , 2022, COLING.
[16] Ron Mokady,et al. ClipCap: CLIP Prefix for Image Captioning , 2021, ArXiv.
[17] Gautam Srivastava,et al. Fuzzy Explainable Attention-based Deep Active Learning on Mental-Health Data , 2021, 2021 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE).
[18] Yejin Choi,et al. VinVL: Revisiting Visual Representations in Vision-Language Models , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[19] Chen Li,et al. A Comprehensive Review of Markov Random Field and Conditional Random Field Approaches in Pathology Image Analysis , 2021, Archives of Computational Methods in Engineering.
[20] Ilya Sutskever,et al. Learning Transferable Visual Models From Natural Language Supervision , 2021, ICML.
[21] S. Gelly,et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.
[22] H. Nagahara,et al. SCOUTER: Slot Attention-based Classifier for Explainable Image Recognition , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[23] Senja Pollak,et al. BERT meets Shapley: Extending SHAP Explanations to Transformer-based Classifiers , 2021, HACKASHOP.
[24] Ming-Wei Chang,et al. CapWAP: Captioning with a Purpose , 2020, EMNLP.
[25] Yejin Choi,et al. VisualCOMET: Reasoning About the Dynamic Context of a Still Image , 2020, ECCV.
[26] Yangfeng Ji,et al. Generating Hierarchical Explanations on Text Classification via Feature Interaction Detection , 2020, ACL.
[27] Ravi Kumar Mishra,et al. Image Captioning: A Comprehensive Survey , 2020, 2020 International Conference on Power Electronics & IoT Applications in Renewable Energy and its Control (PARC).
[28] S. Dubey,et al. Visual Question Answering using Deep Learning: A Survey and Performance Analysis , 2019, CVIP.
[29] Mani B. Srivastava,et al. How Can I Explain This to You? An Empirical Study of Deep Neural Network Explanation Methods , 2020, NeurIPS.
[30] Octavio Loyola-González,et al. Black-Box vs. White-Box: Understanding Their Advantages and Weaknesses From a Practical Point of View , 2019, IEEE Access.
[31] Benedikt T. Boenninghoff,et al. Explainable Authorship Verification in Social Media via Attention-based Similarity Learning , 2019, 2019 IEEE International Conference on Big Data (Big Data).
[32] Mohit Bansal,et al. LXMERT: Learning Cross-Modality Encoder Representations from Transformers , 2019, EMNLP.
[33] Iryna Gurevych,et al. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks , 2019, EMNLP.
[34] Feng Gao,et al. RAVEN: A Dataset for Relational and Analogical Visual REasoNing , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[35] Ali Farhadi,et al. From Recognition to Cognition: Visual Commonsense Reasoning , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[36] Md. Zakir Hossain,et al. A Comprehensive Survey of Deep Learning for Image Captioning , 2018, ACM Comput. Surv..
[37] Juan Carlos Niebles,et al. Interpretable Visual Question Answering by Visual Grounding From Attention Supervision Mining , 2018, 2019 IEEE Winter Conference on Applications of Computer Vision (WACV).
[38] Aaron J. Fisher,et al. All Models are Wrong, but Many are Useful: Learning a Variable's Importance by Studying an Entire Class of Prediction Models Simultaneously , 2018, J. Mach. Learn. Res..
[39] Brandon M. Greenwell,et al. Interpretable Machine Learning , 2019, Hands-On Machine Learning with R.
[40] Gary Klein,et al. Metrics for Explainable AI: Challenges and Prospects , 2018, ArXiv.
[41] Sabine Süsstrunk,et al. Deep Feature Factorization For Concept Discovery , 2018, ECCV.
[42] Kate Saenko,et al. RISE: Randomized Input Sampling for Explanation of Black-box Models , 2018, BMVC.
[43] Lei Zhang,et al. Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[44] Alessandro Rinaldo,et al. Distribution-Free Predictive Inference for Regression , 2016, Journal of the American Statistical Association.
[45] Scott Lundberg,et al. A Unified Approach to Interpreting Model Predictions , 2017, NIPS.
[46] Avanti Shrikumar,et al. Learning Important Features Through Propagating Activation Differences , 2017, ICML.
[47] Ankur Taly,et al. Axiomatic Attribution for Deep Networks , 2017, ICML.
[48] Abhishek Das,et al. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).
[49] Licheng Yu,et al. Modeling Context in Referring Expressions , 2016, ECCV.
[50] Carlos Guestrin,et al. Model-Agnostic Interpretability of Machine Learning , 2016, ArXiv.
[51] Anna Shcherbina,et al. Not Just a Black Box: Learning Important Features Through Propagating Activation Differences , 2016, ArXiv.
[52] Michael S. Bernstein,et al. Visual7W: Grounded Question Answering in Images , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[53] Alexander Binder,et al. Layer-Wise Relevance Propagation for Deep Neural Network Architectures , 2016 .
[54] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[55] Margaret Mitchell,et al. VQA: Visual Question Answering , 2015, International Journal of Computer Vision.
[56] Thomas Brox,et al. Striving for Simplicity: The All Convolutional Net , 2014, ICLR.
[57] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.
[58] Andrew Zisserman,et al. Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.
[59] Rob Fergus,et al. Visualizing and Understanding Convolutional Networks , 2013, ECCV.
[60] Alistair Moffat,et al. A similarity measure for indefinite rankings , 2010, TOIS.
[61] K. Krippendorff. Reliability in Content Analysis: Some Common Misconceptions and Recommendations , 2004 .
[62] Leo Breiman,et al. Random Forests , 2001, Machine Learning.
[63] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.
[64] L. Shapley. A Value for n-person Games , 1988 .