暂无分享,去创建一个
[1] Tim Miller,et al. Explanation in Artificial Intelligence: Insights from the Social Sciences , 2017, Artif. Intell..
[2] Andrew Slavin Ross,et al. Right for the Right Reasons: Training Differentiable Models by Constraining their Explanations , 2017, IJCAI.
[3] Timo Freiesleben,et al. The Intriguing Relation Between Counterfactual Explanations and Adversarial Examples , 2020, Minds and Machines.
[4] François Laviolette,et al. Domain-Adversarial Training of Neural Networks , 2015, J. Mach. Learn. Res..
[5] Yuval Pinter,et al. Attention is not not Explanation , 2019, EMNLP.
[6] Yulia Tsvetkov,et al. Explaining Black Box Predictions and Unveiling Data Artifacts through Influence Functions , 2020, ACL.
[7] Yoav Goldberg,et al. Aligning Faithful Interpretations with their Social Attribution , 2020, ArXiv.
[8] Percy Liang,et al. Understanding Black-box Predictions via Influence Functions , 2017, ICML.
[9] Samuel R. Bowman,et al. A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference , 2017, NAACL.
[10] Christopher Potts,et al. A large annotated corpus for learning natural language inference , 2015, EMNLP.
[11] Shubham Rathi,et al. Generating Counterfactual and Contrastive Explanations using SHAP , 2019, ArXiv.
[12] Ilia Stepin,et al. A Survey of Contrastive and Counterfactual Explanation Generation Methods for Explainable Artificial Intelligence , 2021, IEEE Access.
[13] Toniann Pitassi,et al. Learning Fair Representations , 2013, ICML.
[14] Alexandra Chouldechova,et al. Bias in Bios: A Case Study of Semantic Representation Bias in a High-Stakes Setting , 2019, FAT.
[15] Yoav Goldberg,et al. Amnesic Probing: Behavioral Explanation with Amnesic Counterfactuals , 2021, Transactions of the Association for Computational Linguistics.
[16] Alexandra Chouldechova,et al. What’s in a Name? Reducing Bias in Bios without Access to Protected Attributes , 2019, NAACL.
[17] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.
[18] Ido Dagan,et al. The Third PASCAL Recognizing Textual Entailment Challenge , 2007, ACL-PASCAL@ACL.
[19] Jeffrey Heer,et al. Polyjuice: Automated, General-purpose Counterfactual Generation , 2021, ArXiv.
[20] Noah Goodman,et al. Investigating Transferability in Pretrained Language Models , 2020, EMNLP.
[21] Ana Marasovi'c,et al. Explaining NLP Models via Minimal Contrastive Editing (MiCE) , 2021, FINDINGS.
[22] Suresh Venkatasubramanian,et al. Problems with Shapley-value-based explanations as feature importance measures , 2020, ICML.
[23] Tim Miller,et al. Contrastive explanation: a structural-model approach , 2018, The Knowledge Engineering Review.
[24] Carlos Guestrin,et al. "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.
[25] Yoav Goldberg,et al. Towards Faithfully Interpretable NLP Systems: How Should We Define and Evaluate Faithfulness? , 2020, ACL.
[26] Andrew Zisserman,et al. Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.
[27] Daniel S. Weld,et al. Data Staining: A Method for Comparing Faithfulness of Explainers , 2020 .
[28] Max Welling,et al. Visualizing Deep Neural Network Decisions: Prediction Difference Analysis , 2017, ICLR.
[29] Carolyn Penstein Rosé,et al. Stress Test Evaluation for Natural Language Inference , 2018, COLING.
[30] Uri Shalit,et al. CausaLM: Causal Model Explanation Through Counterfactual Language Models , 2020, CL.
[31] Yoav Goldberg,et al. Null It Out: Guarding Protected Attributes by Iterative Nullspace Projection , 2020, ACL.
[32] Noah A. Smith,et al. Evaluating Models’ Local Decision Boundaries via Contrast Sets , 2020, FINDINGS.
[33] Trevor Darrell,et al. Generating Counterfactual Explanations with Natural Language , 2018, ICML 2018.
[34] Yoav Goldberg,et al. Where’s My Head? Definition, Data Set, and Models for Numeric Fused-Head Identification and Resolution , 2019, Transactions of the Association for Computational Linguistics.
[35] Sungroh Yoon,et al. Interpretation of NLP Models through Input Marginalization , 2020, EMNLP.
[36] Eduard Hovy,et al. Learning the Difference that Makes a Difference with Counterfactually-Augmented Data , 2020, ICLR.
[37] Jeffrey Heer,et al. Polyjuice: Generating Counterfactuals for Explaining, Evaluating, and Improving Models , 2021, ACL.
[38] D. Hilton. Knowledge-Based Causal Attribution : The Abnormal Conditions Focus Model , 2004 .
[39] Martin Wattenberg,et al. Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV) , 2017, ICML.
[40] Regina Barzilay,et al. Rationalizing Neural Predictions , 2016, EMNLP.
[41] Yonatan Belinkov,et al. Probing the Probing Paradigm: Does Probing Accuracy Entail Task Relevance? , 2020, EACL.
[42] Daniel Jurafsky,et al. Understanding Neural Networks through Representation Erasure , 2016, ArXiv.
[43] Florian Mohnert,et al. Under the Hood: Using Diagnostic Classifiers to Investigate and Improve how Language Models Track Agreement Information , 2018, BlackboxNLP@EMNLP.
[44] Germund Hesslow,et al. The problem of causal selection , 1988 .
[45] Yejin Choi,et al. Dataset Cartography: Mapping and Diagnosing Datasets with Training Dynamics , 2020, EMNLP.
[46] R. Thomas McCoy,et al. Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural Language Inference , 2019, ACL.
[47] Rachel Rudinger,et al. Hypothesis Only Baselines in Natural Language Inference , 2018, *SEMEVAL.
[48] Yang Liu,et al. Actionable Recourse in Linear Classification , 2018, FAT.
[49] Trevor Darrell,et al. Contrastive Examples for Addressing the Tyranny of the Majority , 2020, ArXiv.
[50] John Hewitt,et al. Designing and Interpreting Probes with Control Tasks , 2019, EMNLP.
[51] Yonatan Belinkov,et al. Causal Mediation Analysis for Interpreting Neural NLP: The Case of Gender Bias , 2020, ArXiv.
[52] Keith A. Markus,et al. Making Things Happen: A Theory of Causal Explanation , 2007 .
[53] Richard Meyes,et al. Under the Hood of Neural Networks: Characterizing Learned Representations by Functional Neuron Populations and Network Ablations , 2020, ArXiv.
[54] Omer Levy,et al. Annotation Artifacts in Natural Language Inference Data , 2018, NAACL.
[55] Luke S. Zettlemoyer,et al. AllenNLP: A Deep Semantic Natural Language Processing Platform , 2018, ArXiv.