Post-hoc Interpretability for Neural NLP: A Survey
暂无分享,去创建一个
[1] Aoshuang Ye,et al. Towards a Robust Deep Neural Network Against Adversarial Texts: A Survey , 2023, IEEE Transactions on Knowledge and Data Engineering.
[2] Katja Filippova,et al. “Will You Find These Shortcuts?” A Protocol for Evaluating the Faithfulness of Input Salience Methods for Text Classification , 2021, EMNLP.
[3] Eduard Hovy,et al. Interpreting Deep Learning Models in Natural Language Processing: A Review , 2021, ArXiv.
[4] Siva Reddy,et al. Evaluating the Faithfulness of Importance Measures in NLP by Recursively Masking Allegedly Important Tokens and Retraining , 2021, EMNLP.
[5] Thomas Lukasiewicz,et al. Are Training Resources Insufficient? Predict First Then Explain! , 2021, ArXiv.
[6] Sarath Chandar,et al. Local Structure Matters Most: Perturbation Study in NLU , 2021, FINDINGS.
[7] Isabelle Augenstein,et al. Is Sparse Attention more Interpretable? , 2021, ACL.
[8] Emily M. Bender,et al. On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜 , 2021, FAccT.
[9] L. A. Ureña-López,et al. A Survey on Bias in Deep NLP , 2021, Applied Sciences.
[10] Yonatan Belinkov,et al. Probing Classifiers: Promises, Shortcomings, and Advances , 2021, CL.
[11] Ana Marasović,et al. Teach Me to Explain: A Review of Datasets for Explainable Natural Language Processing , 2021, NeurIPS Datasets and Benchmarks.
[12] Neil Safier,et al. Translating , 2021, Information.
[13] Jeffrey Heer,et al. Polyjuice: Generating Counterfactuals for Explaining, Evaluating, and Improving Models , 2021, ACL.
[14] Caiming Xiong,et al. FastIF: Scalable Influence Functions for Efficient Model Interpretation and Debugging , 2020, EMNLP.
[15] Joelle Pineau,et al. UnNatural Language Inference , 2020, ACL.
[16] Matthew E. Peters,et al. Explaining NLP Models via Minimal Contrastive Editing (MiCE) , 2020, FINDINGS.
[17] Daniel E. Ho,et al. Affirmative Algorithms: The Legal Grounds for Fairness as Awareness , 2020, ArXiv.
[18] Sameer Singh,et al. Interpreting Predictions of NLP Models , 2020, EMNLP.
[19] Jasmijn Bastings,et al. The elephant in the interpretability room: Why use attention as explanation when we have saliency methods? , 2020, BLACKBOXNLP.
[20] Shiyue Zhang,et al. Leakage-Adjusted Simulatability: Can Models Generate Non-Trivial Explanations of Their Behavior in Natural Language? , 2020, FINDINGS.
[21] R. Aharonov,et al. A Survey of the State of Explainable AI for Natural Language Processing , 2020, AACL.
[22] Sebastian Gehrmann,et al. The Language Interpretability Tool: Extensible, Interactive Visualizations and Analysis for NLP Models , 2020, Conference on Empirical Methods in Natural Language Processing.
[23] Yonatan Belinkov,et al. Interpretability and Analysis in Neural NLP , 2020, ACL.
[24] Jacob Andreas,et al. Compositional Explanations of Neurons , 2020, NeurIPS.
[25] F. Rossi,et al. The State of the Art in Enhancing Trust in Machine Learning Models with the Use of Visualizations , 2020, Comput. Graph. Forum.
[26] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.
[27] Sawan Kumar,et al. NILE : Natural Language Inference with Faithful Natural Language Explanations , 2020, ACL.
[28] Yulia Tsvetkov,et al. Explaining Black Box Predictions and Unveiling Data Artifacts through Influence Functions , 2020, ACL.
[29] Shafiq R. Joty,et al. It’s Morphin’ Time! Combating Linguistic Discrimination with Inflectional Perturbations , 2020, ACL.
[30] Sameer Singh,et al. Obtaining Faithful Interpretations from Compositional Neural Networks , 2020, ACL.
[31] Willem Zuidema,et al. Quantifying Attention Flow in Transformers , 2020, ACL.
[32] Jonathan Berant,et al. Explaining Question Answering Models through Text Generation , 2020, ArXiv.
[33] Yoav Goldberg,et al. Towards Faithfully Interpretable NLP Systems: How Should We Define and Evaluate Faithfulness? , 2020, ACL.
[34] Noah A. Smith,et al. Evaluating Models’ Local Decision Boundaries via Contrast Sets , 2020, FINDINGS.
[35] Ivan Titov,et al. Information-Theoretic Probing with Minimum Description Length , 2020, EMNLP.
[36] Amit Dhurandhar,et al. Model Agnostic Multilevel Explanations , 2020, NeurIPS.
[37] Anh Nguyen,et al. SAM: The Sensitivity of Attribution Methods to Hyperparameters , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[38] Anna Rumshisky,et al. A Primer in BERTology: What We Know About How BERT Works , 2020, Transactions of the Association for Computational Linguistics.
[39] Frederick Liu,et al. Estimating Training Data Influence by Tracking Gradient Descent , 2020, NeurIPS.
[40] Alec Radford,et al. Scaling Laws for Neural Language Models , 2020, ArXiv.
[41] D. Roth,et al. Neural Module Networks for Reasoning over Text , 2019, ICLR.
[42] Brandon M. Greenwell,et al. Interpretable Machine Learning , 2019, Hands-On Machine Learning with R.
[43] Himabindu Lakkaraju,et al. Fooling LIME and SHAP: Adversarial Attacks on Post hoc Explanation Methods , 2019, AIES.
[44] Brian W. Powers,et al. Dissecting racial bias in an algorithm used to manage the health of populations , 2019, Science.
[45] Peter J. Liu,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..
[46] R'emi Louf,et al. HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.
[47] Zachary Chase Lipton,et al. Learning the Difference that Makes a Difference with Counterfactually-Augmented Data , 2019, ICLR.
[48] Manaal Faruqui,et al. Attention Interpretability Across NLP Tasks , 2019, ArXiv.
[49] Zachary Chase Lipton,et al. Learning to Deceive with Attention-Based Explanations , 2019, ACL.
[50] Ankur Taly,et al. Explainable machine learning in deployment , 2019, FAT*.
[51] John Hewitt,et al. Designing and Interpreting Probes with Control Tasks , 2019, EMNLP.
[52] Kristina Lerman,et al. A Survey on Bias and Fairness in Machine Learning , 2019, ACM Comput. Surv..
[53] Sameer Singh,et al. Universal Adversarial Triggers for Attacking and Analyzing NLP , 2019, EMNLP.
[54] Yuval Pinter,et al. Attention is not not Explanation , 2019, EMNLP.
[55] Roger Wattenhofer,et al. On Identifiability in Transformers , 2019, ICLR.
[56] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.
[57] Jaime S. Cardoso,et al. Machine Learning Interpretability: A Survey on Methods and Metrics , 2019, Electronics.
[58] Cuntai Guan,et al. A Survey on Explainable Artificial Intelligence (XAI): Toward Medical XAI , 2019, IEEE Transactions on Neural Networks and Learning Systems.
[59] Yash Goyal,et al. Explaining Classifiers with Causal Concept Effect (CaCE) , 2019, ArXiv.
[60] Omer Levy,et al. What Does BERT Look at? An Analysis of BERT’s Attention , 2019, BlackboxNLP@ACL.
[61] Richard Socher,et al. Explain Yourself! Leveraging Language Models for Commonsense Reasoning , 2019, ACL.
[62] Martin Wattenberg,et al. Visualizing and Measuring the Geometry of BERT , 2019, NeurIPS.
[63] Noah A. Smith,et al. Is Attention Interpretable? , 2019, ACL.
[64] Alex Wang,et al. What do you learn from context? Probing for sentence structure in contextualized word representations , 2019, ICLR.
[65] Dipanjan Das,et al. BERT Rediscovers the Classical NLP Pipeline , 2019, ACL.
[66] Omer Levy,et al. Are Sixteen Heads Really Better than One? , 2019, NeurIPS.
[67] Jason Baldridge,et al. PAWS: Paraphrase Adversaries from Word Scrambling , 2019, NAACL.
[68] Andreas Madsen,et al. Visualizing memorization in RNNs , 2019, Distill.
[69] Byron C. Wallace,et al. Attention is not Explanation , 2019, NAACL.
[70] Run Wang,et al. Towards a Robust Deep Neural Network in Texts: A Survey , 2019 .
[71] James Zou,et al. Towards Automatic Concept-based Explanations , 2019, NeurIPS.
[72] R. Thomas McCoy,et al. Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural Language Inference , 2019, ACL.
[73] Chih-Kuan Yeh,et al. On the (In)fidelity and Sensitivity for Explanations. , 2019, 1901.09392.
[74] J. Paisley,et al. Global Explanations of Neural Networks: Mapping the Landscape of Predictions , 2019, AIES.
[75] Yonatan Belinkov,et al. Analysis Methods in Neural Language Processing: A Survey , 2018, TACL.
[76] Thomas Lukasiewicz,et al. e-SNLI: Natural Language Inference with Natural Language Explanations , 2018, NeurIPS.
[77] Cynthia Rudin,et al. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead , 2018, Nature Machine Intelligence.
[78] Pradeep Ravikumar,et al. Representer Point Selection for Explaining Deep Neural Networks , 2018, NeurIPS.
[79] William Yang Wang,et al. Towards Explainable NLP: A Generative Explanation Framework for Text Classification , 2018, ACL.
[80] Samuel R. Bowman,et al. Language Modeling Teaches You More than Translation Does: Lessons Learned Through Auxiliary Syntactic Task Analysis , 2018, BlackboxNLP@EMNLP.
[81] Been Kim,et al. Sanity Checks for Saliency Maps , 2018, NeurIPS.
[82] Amina Adadi,et al. Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI) , 2018, IEEE Access.
[83] Alex Jones,et al. Pre-print , 2018 .
[84] Xia Hu,et al. Techniques for interpretable machine learning , 2018, Commun. ACM.
[85] Carlos Guestrin,et al. Semantically Equivalent Adversarial Rules for Debugging NLP models , 2018, ACL.
[86] D. Erhan,et al. A Benchmark for Interpretability Methods in Deep Neural Networks , 2018, NeurIPS.
[87] Ankur Taly,et al. Did the Model Understand the Question? , 2018, ACL.
[88] Guillaume Lample,et al. What you can cram into a single $&!#* vector: Probing sentence embeddings for linguistic properties , 2018, ACL.
[89] Alexander M. Rush,et al. Seq2seq-Vis: A Visual Debugging Tool for Sequence-to-Sequence Models , 2018, IEEE Transactions on Visualization and Computer Graphics.
[90] Carlos Guestrin,et al. Anchors: High-Precision Model-Agnostic Explanations , 2018, AAAI.
[91] Samuel R. Bowman,et al. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.
[92] Roger Wattenhofer,et al. Natural Language Multitasking: Analyzing and Improving Syntactic Saliency of Hidden Representations , 2018, NIPS 2018.
[93] Dejing Dou,et al. HotFlip: White-Box Adversarial Examples for Text Classification , 2017, ACL.
[94] Martin Wattenberg,et al. Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV) , 2017, ICML.
[95] J. Wieting,et al. ParaNMT-50M: Pushing the Limits of Paraphrastic Sentence Embeddings with Millions of Machine Translations , 2017, ACL.
[96] David Weinberger,et al. Accountability of AI Under the Law: The Role of Explanation , 2017, ArXiv.
[97] Dumitru Erhan,et al. The (Un)reliability of saliency methods , 2017, Explainable AI.
[98] Alice H. Oh,et al. Rotated Word Vector Representations and their Interpretability , 2017, EMNLP.
[99] Alun D. Preece,et al. Interpretability of deep learning models: A survey of results , 2017, 2017 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computed, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI).
[100] Tommi S. Jaakkola,et al. A causal framework for explaining the predictions of black-box sequence-to-sequence models , 2017, EMNLP.
[101] Tim Miller,et al. Explanation in Artificial Intelligence: Insights from the Social Sciences , 2017, Artif. Intell..
[102] Scott Lundberg,et al. A Unified Approach to Interpreting Model Predictions , 2017, NIPS.
[103] Anca D. Dragan,et al. Translating Neuralese , 2017, ACL.
[104] Samuel R. Bowman,et al. A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference , 2017, NAACL.
[105] Percy Liang,et al. Understanding Black-box Predictions via Influence Functions , 2017, ICML.
[106] Ankur Taly,et al. Axiomatic Attribution for Deep Networks , 2017, ICML.
[107] Been Kim,et al. Towards A Rigorous Science of Interpretable Machine Learning , 2017, 1702.08608.
[108] Emmanuel Dupoux,et al. Assessing the Ability of LSTMs to Learn Syntax-Sensitive Dependencies , 2016, TACL.
[109] Xing Shi,et al. Does String-Based Neural MT Learn Source Syntax? , 2016, EMNLP.
[110] Yonatan Belinkov,et al. Fine-grained Analysis of Sentence Embeddings Using Auxiliary Prediction Tasks , 2016, ICLR.
[111] Adam Tauman Kalai,et al. Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings , 2016, NIPS.
[112] Regina Barzilay,et al. Rationalizing Neural Predictions , 2016, EMNLP.
[113] Zachary Chase Lipton. The mythos of model interpretability , 2016, ACM Queue.
[114] Neil T. Heffernan,et al. AXIS: Generating Explanations at Scale with Learnersourcing and Machine Learning , 2016, L@S.
[115] Marco Tulio Ribeiro,et al. “Why Should I Trust You?”: Explaining the Predictions of Any Classifier , 2016, NAACL.
[116] Dan Klein,et al. Neural Module Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[117] Arne Köhn,et al. What’s in an Embedding? Analyzing Word Embeddings through Multilingual Evaluation , 2015, EMNLP.
[118] Xinlei Chen,et al. Visualizing and Understanding Neural Models in NLP , 2015, NAACL.
[119] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.
[120] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[121] Christopher Potts,et al. Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.
[122] Andrew Y. Ng,et al. Parsing with Compositional Vector Grammars , 2013, ACL.
[123] Shivaram Kalyanakrishnan,et al. Information Complexity in Bandit Subset Selection , 2013, COLT.
[124] J. Dean,et al. Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.
[125] Jordan L. Boyd-Graber,et al. Reading Tea Leaves: How Humans Interpret Topic Models , 2009, NIPS.
[126] Motoaki Kawanabe,et al. How to Explain Individual Classification Decisions , 2009, J. Mach. Learn. Res..
[127] Jason W. Osborne,et al. Best practices in exploratory factor analysis: four recommendations for getting the most from your analysis. , 2005 .
[128] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.
[129] Judea Pearl,et al. Direct and Indirect Effects , 2001, UAI.
[130] Bernhard Schölkopf,et al. A Generalized Representer Theorem , 2001, COLT/EuroCOLT.
[131] L. Shapley. A Value for n-person Games , 1988 .
[132] S. Weisberg,et al. Characterizations of an Empirical Influence Function for Detecting Influential Cases in Regression , 1980 .
[133] G. A. Ferguson,et al. A general rotation criterion and its use in orthogonal rotation , 1970 .
[134] Karl Pearson F.R.S.. LIII. On lines and planes of closest fit to systems of points in space , 1901 .
[135] Yonatan Belinkov,et al. Probing Classifiers: Promises, Shortcomings, and Alternatives , 2021, ArXiv.
[136] Jeffrey Heer,et al. Polyjuice: Automated, General-purpose Counterfactual Generation , 2021, ArXiv.
[137] Yonatan Belinkov,et al. Investigating Gender Bias in Language Models Using Causal Mediation Analysis , 2020, NeurIPS.
[138] Lovekesh Vig,et al. Guided-LIME: Structured Sampling based Hybrid Approach towards Explaining Blackbox Machine Learning Models , 2020, CIKM.
[139] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[140] Jonathan Berant,et al. CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge , 2019, NAACL.
[141] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .
[142] Yejin Choi,et al. An Adversarial Winograd Schema Challenge at Scale , 2019 .
[143] Marko Bohanec,et al. Perturbation-Based Explanations of Prediction Models , 2018, Human and Machine Learning.
[144] Alec Radford,et al. Improving Language Understanding by Generative Pre-Training , 2018 .
[145] Michael I. Jordan,et al. Latent Dirichlet Allocation , 2009 .
[146] Geoffrey E. Hinton,et al. Visualizing Data using t-SNE , 2008 .
[147] Miguel Ángel García Cumbreras,et al. Association for Computational Linguistics , 2001 .
[148] Dragomir R. Radev,et al. of the Association for Computational Linguistics , 2022 .