Logical Explanations for Deep Relational Machines Using Relevance Information

Our interest in this paper is in the construction of symbolic explanations for predictions made by a deep neural network. We will focus attention on deep relational machines (DRMs, first proposed by H. Lodhi). A DRM is a deep network in which the input layer consists of Boolean-valued functions (features) that are defined in terms of relations provided as domain, or background, knowledge. Our DRMs differ from those proposed by Lodhi, which use an Inductive Logic Programming (ILP) engine to first select features (we use random selections from a space of features that satisfies some approximate constraints on logical relevance and non-redundancy). But why do the DRMs predict what they do? One way of answering this is the LIME setting, in which readable proxies for a black-box predictor. The proxies are intended only to model the predictions of the black-box in local regions of the instance-space. But readability alone may not enough: to be understandable, the local models must use relevant concepts in an meaningful manner. We investigate the use of a Bayes-like approach to identify logical proxies for local predictions of a DRM. We show: (a) DRM's with our randomised propositionalization method achieve state-of-the-art predictive performance; (b) Models in first-order logic can approximate the DRM's prediction closely in a small local region; and (c) Expert-provided relevance information can play the role of a prior to distinguish between logical explanations that perform equivalently on prediction alone.

[1]  Stefan Kramer Demand-Driven Construction of Structural Features in ILP , 2001, ILP.

[2]  Ashwin Srinivasan,et al.  Theories for Mutagenicity: A Study in First-Order and Feature-Based Induction , 1996, Artif. Intell..

[3]  Stephen Muggleton,et al.  Inductive logic programming: derivations, successes and shortcomings , 1993, SGAR.

[4]  Stephen Muggleton,et al.  How Does Predicate Invention Affect Human Comprehensibility? , 2016, ILP.

[5]  Ian Stewart,et al.  The Ultimate in Anty-Particles , 1994 .

[6]  Tom M. Mitchell,et al.  Explanation-Based Generalization: A Unifying View , 1986, Machine Learning.

[7]  Alexandrin Popescul,et al.  Dynamic Feature Generation for Relational Learning , 2022 .

[8]  Stephen Muggleton,et al.  Duce, An Oracle-based Approach to Constructive Induction , 1987, IJCAI.

[9]  Artur S. d'Avila Garcez,et al.  Relational Knowledge Extraction from Neural Networks , 2015, CoCo@NIPS.

[10]  Sebastian Thrun,et al.  Extracting Rules from Artifical Neural Networks with Distributed Representations , 1994, NIPS.

[11]  Christopher John Hogger,et al.  Essentials of logic programming , 1990 .

[12]  Krysia Broda,et al.  Neural-symbolic learning systems - foundations and applications , 2012, Perspectives in neural computing.

[13]  Alen D. Shapiro,et al.  Structured induction in expert systems , 1987 .

[14]  I. Good Good Thinking: The Foundations of Probability and Its Applications , 1983 .

[15]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[16]  Krzysztof R. Apt,et al.  Logic Programming , 1990, Handbook of Theoretical Computer Science, Volume B: Formal Models and Sematics.

[17]  Alberto Pettorossi,et al.  Transformation of Logic Programs , 1994 .

[18]  Ivan Bratko,et al.  Function Decomposition in Machine Learning , 2001, Machine Learning and Its Applications.

[19]  Ashwin Srinivasan,et al.  What Kinds of Relational Features Are Useful for Statistical Learning? , 2012, ILP.

[20]  Huma Lodhi,et al.  Deep Relational Machines , 2013, ICONIP.

[21]  Artur S. d'Avila Garcez,et al.  The Connectionist Inductive Learning and Logic Programming System , 1999, Applied Intelligence.

[22]  R. King,et al.  Prediction of rodent carcinogenicity bioassays from molecular structure using inductive logic programming. , 1996, Environmental health perspectives.

[23]  Stephen Muggleton,et al.  Inverse entailment and progol , 1995, New Generation Computing.

[24]  Arun Sharma,et al.  LIME: A System for Learning Relations , 1998, ALT.

[25]  M J Sternberg,et al.  Structure-activity relationships derived by machine learning: the use of atoms and their bond connectivities to predict mutagenicity by inductive logic programming. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[26]  Ashwin Srinivasan,et al.  Parameter Screening and Optimisation for ILP using Designed Experiments , 2011, J. Mach. Learn. Res..

[27]  Luc De Raedt,et al.  kFOIL: Learning Simple Relational Kernels , 2006, AAAI.

[28]  Artur S. d'Avila Garcez,et al.  Fast relational learning using bottom clause propositionalization with artificial neural networks , 2013, Machine Learning.

[29]  Charu C. Aggarwal,et al.  On the Surprising Behavior of Distance Metrics in High Dimensional Spaces , 2001, ICDT.

[30]  Richard C. T. Lee,et al.  Symbolic logic and mechanical theorem proving , 1973, Computer science classics.

[31]  Dov M. Gabbay,et al.  Labelled Deductive Systems: Volume 1 , 1996 .

[32]  Ondrej Kuzelka,et al.  Lifted Relational Neural Networks , 2015, CoCo@NIPS.

[33]  Ashwin Srinivasan,et al.  Randomised restarted search in ILP , 2006, Machine Learning.

[34]  Christopher H. Bryant,et al.  Functional genomic hypothesis generation and experimentation by a robot scientist , 2004, Nature.

[35]  Donald Michie,et al.  The Creative Computer: Machine Intelligence and Human Knowledge , 1984 .

[36]  Kai-Uwe Kühnberger,et al.  Neural-Symbolic Learning and Reasoning: A Survey and Interpretation , 2017, Neuro-Symbolic Artificial Intelligence.

[37]  Ashwin Srinivasan,et al.  Extracting Context-Sensitive Models in Inductive Logic Programming , 2001, Machine Learning.

[38]  Nils J. Nilsson,et al.  Principles of Artificial Intelligence , 1980, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39]  Ashwin Srinivasan,et al.  Feature construction with Inductive Logic Programming: A Study of Quantitative Predictions of Biological Activity Aided by Structural Attributes , 1999, Data Mining and Knowledge Discovery.

[40]  Jude W. Shavlik,et al.  Using Sampling and Queries to Extract Rules from Trained Neural Networks , 1994, ICML.

[41]  Stephen Muggleton,et al.  Meta-interpretive learning of higher-order dyadic datalog: predicate invention revisited , 2013, Machine Learning.

[42]  Nada Lavrac,et al.  Propositionalization-based relational subgroup discovery with RSD , 2006, Machine Learning.

[43]  Lovekesh Vig,et al.  Large-Scale Assessment of Deep Relational Machines , 2018, ILP.

[44]  Donald Michie,et al.  The superarticulacy phenomenon in the context of software manufacture , 1990 .

[45]  Ashwin Srinivasan,et al.  Topic Models with Relational Features for Drug Design , 2012, ILP.

[46]  Stephen Muggleton,et al.  TopLog: ILP Using a Logic Program Declarative Bias , 2008, ICLP.

[47]  Thomas G. Dietterich Adaptive computation and machine learning , 1998 .

[48]  Céline Rouveirol,et al.  Lazy Propositionalisation for Relational Learning , 2000, ECAI.

[49]  Ashwin Srinivasan,et al.  An Empirical Study of the Use of Relevance Information in Inductive Logic Programming , 2003, J. Mach. Learn. Res..

[50]  Luc De Raedt,et al.  Inductive Logic Programming: Theory and Methods , 1994, J. Log. Program..

[51]  R. Scozzafava,et al.  A coherent qualitative Bayes' theorem and its application in artificial intelligence , 1993, 1993 (2nd) International Symposium on Uncertainty Modeling and Analysis.

[52]  Michael Bain,et al.  Experiments in Non-Monotonic Learning , 1991, ML.

[53]  Hendrik Blockeel,et al.  Top-Down Induction of First Order Logical Decision Trees , 1998, AI Commun..