Learning Interpretable Concept-Based Models with Human Feedback

Machine learning models that first learn a representation of a domain in terms of human-understandable concepts, then use it to make predictions, have been proposed to facilitate interpretation and interaction with models trained on high-dimensional data. However these methods have important limitations: the way they define concepts are not inherently interpretable, and they assume that concept labels either exist for individual instances or can easily be acquired from users. These limitations are particularly acute for high-dimensional tabular features. We propose an approach for learning a set of transparent concept definitions in high-dimensional tabular data that relies on users labeling concept features instead of individual instances. Our method produces concepts that both align with users' intuitive sense of what a concept means, and facilitate prediction of the downstream label by a transparent machine learning model. This ensures that the full model is transparent and intuitive, and as predictive as possible given this constraint. We demonstrate with simulated user feedback on real prediction problems, including one in a clinical domain, that this kind of direct feedback is much more efficient at learning solutions that align with ground truth concept definitions than alternative transparent approaches that rely on labeling instances or other existing interaction mechanisms, while maintaining similar predictive performance.

[1]  Kristen Grauman,et al.  Interactively building a discriminative vocabulary of nameable attributes , 2011, CVPR 2011.

[2]  Frank D. Wood,et al.  Learning Disentangled Representations with Semi-Supervised Deep Generative Models , 2017, NIPS.

[3]  Martin Wattenberg,et al.  Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV) , 2017, ICML.

[4]  Jure Leskovec,et al.  Interpretable Decision Sets: A Joint Framework for Description and Prediction , 2016, KDD.

[5]  Gideon S. Mann,et al.  Learning from labeled features using generalized expectation criteria , 2008, SIGIR '08.

[6]  Kevin D. Seppi,et al.  Labeled Anchors and a Scalable, Transparent, and Interactive Classifier , 2018, EMNLP.

[7]  Jordan L. Boyd-Graber,et al.  Tandem Anchoring: a Multiword Anchor Approach for Interactive Topic Modeling , 2017, ACL.

[8]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[9]  Desney S. Tan,et al.  Overview based example selection in end user interactive concept learning , 2009, UIST '09.

[10]  Melissa A. Basford,et al.  Robust replication of genotype-phenotype associations across multiple diseases in an electronic medical record. , 2010, American journal of human genetics.

[11]  Alisha R Pollastri,et al.  Validation of electronic health record phenotyping of bipolar disorder cases and controls. , 2015, The American journal of psychiatry.

[12]  Cynthia Rudin,et al.  Supersparse linear integer models for optimized medical scoring systems , 2015, Machine Learning.

[13]  Zachary Chase Lipton The mythos of model interpretability , 2016, ACM Queue.

[14]  Hema Raghavan,et al.  Active Learning with Feedback on Features and Instances , 2006, J. Mach. Learn. Res..

[15]  Mark Olfson,et al.  A systematic review of validated methods for identifying depression using administrative data , 2012, Pharmacoepidemiology and drug safety.

[16]  Daniel G. Goldstein,et al.  Manipulating and Measuring Model Interpretability , 2018, CHI.

[17]  J. Lafferty,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[18]  Damaris Zurell,et al.  Collinearity: a review of methods to deal with it and a simulation study evaluating their performance , 2013 .

[19]  Cynthia Rudin,et al.  Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead , 2018, Nature Machine Intelligence.

[20]  Cynthia Rudin,et al.  Interpretable classifiers using rules and Bayesian analysis: Building a better stroke prediction model , 2015, ArXiv.

[21]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[22]  Been Kim,et al.  Concept Bottleneck Models , 2020, ICML.

[23]  Hisashi Kashima,et al.  AdaFlock: Adaptive Feature Discovery for Human-in-the-loop Predictive Modeling , 2018, AAAI.

[24]  Samuel J. Gershman,et al.  Human Evaluation of Models Built for Interpretability , 2019, HCOMP.

[25]  Alex Lascarides,et al.  Interpretable Latent Spaces for Learning from Demonstration , 2018, CoRL.

[26]  Quentin Pleple,et al.  Interactive Topic Modeling , 2013 .