论文信息 - Augmenting interpretable models with large language models during training - 字舞流文

Augmenting interpretable models with large language models during training

Recent large language models (LLMs) have demonstrated remarkable prediction performance for a growing array of tasks. However, their proliferation into high-stakes domains (e.g. medicine) and compute-limited settings has created a burgeoning need for interpretability and efficiency. We address this need by proposing Augmented Interpretable Models (Aug-imodels), a framework for leveraging the knowledge learned by LLMs to build extremely efficient and interpretable models. Aug-imodels use LLMs during fitting but not during inference, allowing complete transparency and often a speed/memory improvement of greater than 1,000x for inference compared to LLMs. We explore two instantiations of Aug-imodels in natural-language processing: (i) Aug-GAM, which augments a generalized additive model with decoupled embeddings from an LLM and (ii) Aug-Tree, which augments a decision tree with LLM feature expansions. Across a variety of text-classification datasets, both outperform their non-augmented counterparts. Aug-GAM can even outperform much larger models (e.g. a 6-billion parameter GPT-J model), despite having 10,000x fewer parameters and being fully transparent. We further explore Aug-imodels in a natural-language fMRI study, where they generate interesting interpretations from scientific data. All code for using Aug-imodels and reproducing results is made available on Github.

R. Caruana | Armin Askari | Jianfeng Gao | Chandan Singh

[1] Kathleen A. Creel,et al. Ecosystem Graphs: The Social Footprint of Foundation Models , 2023, ArXiv.

[2] Marco Tulio Ribeiro,et al. Sparks of Artificial General Intelligence: Early experiments with GPT-4 , 2023, ArXiv.

[3] Byron C. Wallace,et al. CHiLL: Zero-shot Custom Interpretable Feature Extraction from Clinical Notes with Large Language Models , 2023, ArXiv.

[4] K. Batmanghelich,et al. Route, Interpret, Repeat: Blurring the Line Between Post hoc Explainability and Interpretable Models , 2023, ArXiv.

[5] Noah A. Smith,et al. One Embedder, Any Task: Instruction-Finetuned Text Embeddings , 2022, ArXiv.

[6] Chris Callison-Burch,et al. Language in a Bottle: Language Model Guided Concept Bottlenecks for Interpretable Image Classification , 2022, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7] Alexander G. Huth,et al. Predictive Coding or Just Feature Discovery? An Alternative Account of Why Language Models Fit Brain Data , 2022, Neurobiology of language.

[8] Alexander G. Huth,et al. A natural language fMRI dataset for voxelwise encoding models , 2022, bioRxiv.

[9] James Y. Zou,et al. Post-hoc Concept Bottleneck Models , 2022, ICLR.

[10] A. Butte,et al. Predictability and stability testing to assess clinical decision instrument performance for children after blunt torso trauma , 2022, medRxiv.

[11] J. King,et al. Brains and algorithms partially converge in natural language processing , 2022, Communications Biology.

[12] Alexander M. Rush,et al. PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts , 2022, ACL.

[13] Yan Shuo Tan,et al. Hierarchical Shrinkage: improving the accuracy and interpretability of tree-based methods , 2022, ICML.

[14] Yan Shuo Tan,et al. Fast Interpretable Greedy-Tree Sums (FIGS) , 2022, ArXiv.

[15] Hugues de Mazancourt,et al. Yseop at FinSim-3 Shared Task 2021: Specializing Financial Domain Learning with Phrase Representations , 2021, FINNLP.

[16] Chandan Singh,et al. Adaptive wavelet distillation from neural networks through interpretations , 2021, NeurIPS.

[17] Chandan Singh,et al. Imodels: a Python Package for Fitting Interpretable Models , 2021, J. Open Source Softw..

[18] C. Rudin,et al. Interpretable Machine Learning: Fundamental Principles and 10 Grand Challenges , 2021, Statistics Surveys.

[19] Eghbal A. Hosseini,et al. The neural architecture of language: Integrative modeling converges on predictive processing , 2020, Proceedings of the National Academy of Sciences.

[20] Geoffrey E. Hinton,et al. Neural Additive Models: Interpretable Machine Learning with Neural Nets , 2020, NeurIPS.

[21] Joseph D. Janizek,et al. Explaining Explanations: Axiomatic Feature Interactions for Deep Networks , 2020, J. Mach. Learn. Res..

[22] Been Kim,et al. Concept Bottleneck Models , 2020, ICML.

[23] Jimmy J. Lin,et al. Generalized and Scalable Optimal Sparse Decision Trees , 2020, ICML.

[24] Mark Chen,et al. Language Models are Few-Shot Learners , 2020, NeurIPS.

[25] John X. Morris,et al. TextAttack: A Framework for Adversarial Attacks, Data Augmentation, and Adversarial Training in NLP , 2020, EMNLP.

[26] Bin Yu,et al. Transformation Importance with Applications to Cosmology , 2020, ArXiv.

[27] Colin Raffel,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[28] Chandan Singh,et al. Definitions, methods, and applications in interpretable machine learning , 2019, Proceedings of the National Academy of Sciences.

[29] R'emi Louf,et al. HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[30] Thomas Wolf,et al. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter , 2019, ArXiv.

[31] A. Mignan,et al. One neuron versus deep learning in aftershock prediction , 2019, Nature.

[32] Omer Levy,et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[33] Chandan Singh,et al. Disentangled Attribution Curves for Interpreting Random Forests and Boosted Trees , 2019, ArXiv.

[34] Gabriel Erion,et al. Explainable AI for Trees: From Local Explanations to Global Understanding , 2019, ArXiv.

[35] Margo I. Seltzer,et al. Optimal Sparse Decision Trees , 2019, NeurIPS.

[36] Cynthia Rudin,et al. This Looks Like That: Deep Learning for Interpretable Image Recognition , 2018 .

[37] Chandan Singh,et al. Hierarchical interpretations for neural network predictions , 2018, ICLR.

[38] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[39] Cynthia Rudin,et al. Please Stop Explaining Black Box Models for High Stakes Decisions , 2018, ArXiv.

[40] Albert Gordo,et al. Learning Global Additive Explanations for Neural Nets Using Model Distillation , 2018 .

[41] Luke S. Zettlemoyer,et al. Deep Contextualized Word Representations , 2018, NAACL.

[42] Cynthia Rudin,et al. Deep Learning for Case-based Reasoning through Prototypes: A Neural Network that Explains its Predictions , 2017, AAAI.

[43] Miguel Á. Carreira-Perpiñán,et al. Alternating optimization of decision trees, with application to learning sparse oblique trees , 2018, NeurIPS.

[44] Yi-Shin Chen,et al. CARER: Contextualized Affect Representations for Emotion Recognition , 2018, EMNLP.

[45] Geoffrey E. Hinton,et al. Distilling a Neural Network Into a Soft Decision Tree , 2017, CEx@AI*IA.

[46] Margo I. Seltzer,et al. Learning Certifiably Optimal Rule Lists , 2017, KDD.

[47] Dimitris Bertsimas,et al. Optimal classification trees , 2017, Machine Learning.

[48] Tomas Mikolov,et al. Bag of Tricks for Efficient Text Classification , 2016, EACL.

[49] Seth Flaxman,et al. European Union Regulations on Algorithmic Decision-Making and a "Right to Explanation" , 2016, AI Mag..

[50] Scott Lundberg,et al. An unexpected unity among methods for interpreting model predictions , 2016, ArXiv.

[51] O. Stegle,et al. Deep learning for computational biology , 2016, Molecular systems biology.

[52] Tianqi Chen,et al. XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[53] Thomas L. Griffiths,et al. Supplementary Information for Natural Speech Reveals the Semantic Maps That Tile Human Cerebral Cortex , 2022 .

[54] Carlos Guestrin,et al. "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[55] Johannes Gehrke,et al. Intelligible Models for HealthCare: Predicting Pneumonia Risk and Hospital 30-day Readmission , 2015, KDD.

[56] Geoffrey E. Hinton,et al. Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[57] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[58] Pekka Korhonen,et al. Good debt or bad debt: Detecting semantic orientations in economic texts , 2013, J. Assoc. Inf. Sci. Technol..

[59] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[60] Christopher Potts,et al. Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[61] William L. Oliver,et al. The Emergence of Machine Learning Techniques in Criminology , 2013 .

[62] Jeffrey Dean,et al. Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[63] Toniann Pitassi,et al. Fairness through awareness , 2011, ITCS '12.

[64] Gaël Varoquaux,et al. Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[65] Wei-Yin Loh,et al. Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[66] H. Chipman,et al. BART: Bayesian Additive Regression Trees , 2008, 0806.3286.

[67] Bogdan E. Popescu,et al. PREDICTIVE LEARNING VIA RULE ENSEMBLES , 2008, 0811.1679.

[68] H. Chipman,et al. Bayesian Additive Regression Trees , 2006 .

[69] Bo Pang,et al. Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales , 2005, ACL.

[70] Karen Spärck Jones. A statistical interpretation of term specificity and its application in retrieval , 2021, J. Documentation.

[71] Leo Breiman,et al. Random Forests , 2001, Machine Learning.

[72] J. Ross Quinlan,et al. Induction of Decision Trees , 1986, Machine Learning.

[73] J. Friedman. Greedy function approximation: A gradient boosting machine. , 2001 .

[74] Stefan Sperlich,et al. Generalized Additive Models , 2014 .

[75] Yoav Freund,et al. Experiments with a New Boosting Algorithm , 1996, ICML.

[76] Alberto Maria Segre,et al. Programs for Machine Learning , 1994 .