Augmenting interpretable models with large language models during training

Recent large language models (LLMs) have demonstrated remarkable prediction performance for a growing array of tasks. However, their proliferation into high-stakes domains (e.g. medicine) and compute-limited settings has created a burgeoning need for interpretability and efficiency. We address this need by proposing Augmented Interpretable Models (Aug-imodels), a framework for leveraging the knowledge learned by LLMs to build extremely efficient and interpretable models. Aug-imodels use LLMs during fitting but not during inference, allowing complete transparency and often a speed/memory improvement of greater than 1,000x for inference compared to LLMs. We explore two instantiations of Aug-imodels in natural-language processing: (i) Aug-GAM, which augments a generalized additive model with decoupled embeddings from an LLM and (ii) Aug-Tree, which augments a decision tree with LLM feature expansions. Across a variety of text-classification datasets, both outperform their non-augmented counterparts. Aug-GAM can even outperform much larger models (e.g. a 6-billion parameter GPT-J model), despite having 10,000x fewer parameters and being fully transparent. We further explore Aug-imodels in a natural-language fMRI study, where they generate interesting interpretations from scientific data. All code for using Aug-imodels and reproducing results is made available on Github.

[1]  Kathleen A. Creel,et al.  Ecosystem Graphs: The Social Footprint of Foundation Models , 2023, ArXiv.

[2]  Marco Tulio Ribeiro,et al.  Sparks of Artificial General Intelligence: Early experiments with GPT-4 , 2023, ArXiv.

[3]  Byron C. Wallace,et al.  CHiLL: Zero-shot Custom Interpretable Feature Extraction from Clinical Notes with Large Language Models , 2023, ArXiv.

[4]  K. Batmanghelich,et al.  Route, Interpret, Repeat: Blurring the Line Between Post hoc Explainability and Interpretable Models , 2023, ArXiv.

[5]  Noah A. Smith,et al.  One Embedder, Any Task: Instruction-Finetuned Text Embeddings , 2022, ArXiv.

[6]  Chris Callison-Burch,et al.  Language in a Bottle: Language Model Guided Concept Bottlenecks for Interpretable Image Classification , 2022, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Alexander G. Huth,et al.  Predictive Coding or Just Feature Discovery? An Alternative Account of Why Language Models Fit Brain Data , 2022, Neurobiology of language.

[8]  Alexander G. Huth,et al.  A natural language fMRI dataset for voxelwise encoding models , 2022, bioRxiv.

[9]  James Y. Zou,et al.  Post-hoc Concept Bottleneck Models , 2022, ICLR.

[10]  A. Butte,et al.  Predictability and stability testing to assess clinical decision instrument performance for children after blunt torso trauma , 2022, medRxiv.

[11]  J. King,et al.  Brains and algorithms partially converge in natural language processing , 2022, Communications Biology.

[12]  Alexander M. Rush,et al.  PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts , 2022, ACL.

[13]  Yan Shuo Tan,et al.  Hierarchical Shrinkage: improving the accuracy and interpretability of tree-based methods , 2022, ICML.

[14]  Yan Shuo Tan,et al.  Fast Interpretable Greedy-Tree Sums (FIGS) , 2022, ArXiv.

[15]  Hugues de Mazancourt,et al.  Yseop at FinSim-3 Shared Task 2021: Specializing Financial Domain Learning with Phrase Representations , 2021, FINNLP.

[16]  Chandan Singh,et al.  Adaptive wavelet distillation from neural networks through interpretations , 2021, NeurIPS.

[17]  Chandan Singh,et al.  Imodels: a Python Package for Fitting Interpretable Models , 2021, J. Open Source Softw..

[18]  C. Rudin,et al.  Interpretable Machine Learning: Fundamental Principles and 10 Grand Challenges , 2021, Statistics Surveys.

[19]  Eghbal A. Hosseini,et al.  The neural architecture of language: Integrative modeling converges on predictive processing , 2020, Proceedings of the National Academy of Sciences.

[20]  Geoffrey E. Hinton,et al.  Neural Additive Models: Interpretable Machine Learning with Neural Nets , 2020, NeurIPS.

[21]  Joseph D. Janizek,et al.  Explaining Explanations: Axiomatic Feature Interactions for Deep Networks , 2020, J. Mach. Learn. Res..

[22]  Been Kim,et al.  Concept Bottleneck Models , 2020, ICML.

[23]  Jimmy J. Lin,et al.  Generalized and Scalable Optimal Sparse Decision Trees , 2020, ICML.

[24]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[25]  John X. Morris,et al.  TextAttack: A Framework for Adversarial Attacks, Data Augmentation, and Adversarial Training in NLP , 2020, EMNLP.

[26]  Bin Yu,et al.  Transformation Importance with Applications to Cosmology , 2020, ArXiv.

[27]  Colin Raffel,et al.  Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[28]  Chandan Singh,et al.  Definitions, methods, and applications in interpretable machine learning , 2019, Proceedings of the National Academy of Sciences.

[29]  R'emi Louf,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[30]  Thomas Wolf,et al.  DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter , 2019, ArXiv.

[31]  A. Mignan,et al.  One neuron versus deep learning in aftershock prediction , 2019, Nature.

[32]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[33]  Chandan Singh,et al.  Disentangled Attribution Curves for Interpreting Random Forests and Boosted Trees , 2019, ArXiv.

[34]  Gabriel Erion,et al.  Explainable AI for Trees: From Local Explanations to Global Understanding , 2019, ArXiv.

[35]  Margo I. Seltzer,et al.  Optimal Sparse Decision Trees , 2019, NeurIPS.

[36]  Cynthia Rudin,et al.  This Looks Like That: Deep Learning for Interpretable Image Recognition , 2018 .

[37]  Chandan Singh,et al.  Hierarchical interpretations for neural network predictions , 2018, ICLR.

[38]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[39]  Cynthia Rudin,et al.  Please Stop Explaining Black Box Models for High Stakes Decisions , 2018, ArXiv.

[40]  Albert Gordo,et al.  Learning Global Additive Explanations for Neural Nets Using Model Distillation , 2018 .

[41]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[42]  Cynthia Rudin,et al.  Deep Learning for Case-based Reasoning through Prototypes: A Neural Network that Explains its Predictions , 2017, AAAI.

[43]  Miguel Á. Carreira-Perpiñán,et al.  Alternating optimization of decision trees, with application to learning sparse oblique trees , 2018, NeurIPS.

[44]  Yi-Shin Chen,et al.  CARER: Contextualized Affect Representations for Emotion Recognition , 2018, EMNLP.

[45]  Geoffrey E. Hinton,et al.  Distilling a Neural Network Into a Soft Decision Tree , 2017, CEx@AI*IA.

[46]  Margo I. Seltzer,et al.  Learning Certifiably Optimal Rule Lists , 2017, KDD.

[47]  Dimitris Bertsimas,et al.  Optimal classification trees , 2017, Machine Learning.

[48]  Tomas Mikolov,et al.  Bag of Tricks for Efficient Text Classification , 2016, EACL.

[49]  Seth Flaxman,et al.  European Union Regulations on Algorithmic Decision-Making and a "Right to Explanation" , 2016, AI Mag..

[50]  Scott Lundberg,et al.  An unexpected unity among methods for interpreting model predictions , 2016, ArXiv.

[51]  O. Stegle,et al.  Deep learning for computational biology , 2016, Molecular systems biology.

[52]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[53]  Thomas L. Griffiths,et al.  Supplementary Information for Natural Speech Reveals the Semantic Maps That Tile Human Cerebral Cortex , 2022 .

[54]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[55]  Johannes Gehrke,et al.  Intelligible Models for HealthCare: Predicting Pneumonia Risk and Hospital 30-day Readmission , 2015, KDD.

[56]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[57]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[58]  Pekka Korhonen,et al.  Good debt or bad debt: Detecting semantic orientations in economic texts , 2013, J. Assoc. Inf. Sci. Technol..

[59]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[60]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[61]  William L. Oliver,et al.  The Emergence of Machine Learning Techniques in Criminology , 2013 .

[62]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[63]  Toniann Pitassi,et al.  Fairness through awareness , 2011, ITCS '12.

[64]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[65]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[66]  H. Chipman,et al.  BART: Bayesian Additive Regression Trees , 2008, 0806.3286.

[67]  Bogdan E. Popescu,et al.  PREDICTIVE LEARNING VIA RULE ENSEMBLES , 2008, 0811.1679.

[68]  H. Chipman,et al.  Bayesian Additive Regression Trees , 2006 .

[69]  Bo Pang,et al.  Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales , 2005, ACL.

[70]  Karen Spärck Jones A statistical interpretation of term specificity and its application in retrieval , 2021, J. Documentation.

[71]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[72]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[73]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[74]  Stefan Sperlich,et al.  Generalized Additive Models , 2014 .

[75]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[76]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .