Composition of Relational Features with an Application to Explaining Black-Box Predictors

Relational machine learning programs like those developed in Inductive Logic Programming (ILP) offer several advantages: (1) The ability to model complex relationships amongst data instances; (2) The use of domain-specific relations during model construction; and (3) The models constructed are human-readable, which is often one step closer to being human-understandable. However, these ILP-like methods have not been able to capitalise fully on the rapid hardware, software and algorithmic developments fuelling current developments in deep neural networks. In this paper, we treat relational features as functions and use the notion of generalised composition of functions to derive complex functions from simpler ones. We formulate the notion of a set of $\text{M}$-simple features in a mode language $\text{M}$ and identify two composition operators ($\rho_1$ and $\rho_2$) from which all possible complex features can be derived. We use these results to implement a form of"explainable neural network"called Compositional Relational Machines, or CRMs, which are labelled directed-acyclic graphs. The vertex-label for any vertex $j$ in the CRM contains a feature-function $f_j$ and a continuous activation function $g_j$. If $j$ is a"non-input"vertex, then $f_j$ is the composition of features associated with vertices in the direct predecessors of $j$. Our focus is on CRMs in which input vertices (those without any direct predecessors) all have $\text{M}$-simple features in their vertex-labels. We provide a randomised procedure for constructing and learning such CRMs. Using a notion of explanations based on the compositional structure of features in a CRM, we provide empirical evidence on synthetic data of the ability to identify appropriate explanations; and demonstrate the use of CRMs as 'explanation machines' for black-box models that do not provide explanations for their predictions.

[1]  Marcel van Gerven,et al.  Explainable Deep Learning: A Field Guide for the Uninitiated , 2020, J. Artif. Intell. Res..

[2]  Armando Solar-Lezama,et al.  DreamCoder: bootstrapping inductive program synthesis with wake-sleep library learning , 2021, PLDI.

[3]  Ashwin Srinivasan,et al.  Inclusion of domain-knowledge into GNNs using mode-directed inverse entailment , 2021, Machine Learning.

[4]  Lovekesh Vig,et al.  Incorporating symbolic domain knowledge into graph neural networks , 2020, Machine Learning.

[5]  Nada Lavrač,et al.  Propositionalization of Relational Data , 2021, Representation Learning.

[6]  Plamen Angelov,et al.  Towards Explainable Deep Neural Networks (xDNN) , 2019, Neural Networks.

[7]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[8]  Ashwin Srinivasan,et al.  Discrete Stochastic Search and Its Application to Feature-Selection for Deep Relational Machines , 2019, ICANN.

[9]  Lovekesh Vig,et al.  Logical Explanations for Deep Relational Machines Using Relevance Information , 2018, J. Mach. Learn. Res..

[10]  Luc De Raedt,et al.  Neuro-Symbolic = Neural + Logical + Probabilistic , 2019, NeSy@IJCAI.

[11]  Lovekesh Vig,et al.  Large-Scale Assessment of Deep Relational Machines , 2018, ILP.

[12]  Tommi S. Jaakkola,et al.  Towards Robust Interpretability with Self-Explaining Neural Networks , 2018, NeurIPS.

[13]  Steven Schockaert,et al.  Lifted Relational Neural Networks: Efficient Learning of Latent Relational Structures , 2018, J. Artif. Intell. Res..

[14]  Lovekesh Vig,et al.  An Investigation into the Role of Domain-Knowledge on the Use of Embeddings , 2017, ILP.

[15]  Jack Copeland,et al.  Intelligent machinery , 2017, The Turing Guide.

[16]  Alexander Binder,et al.  Layer-Wise Relevance Propagation for Deep Neural Network Architectures , 2016 .

[17]  Alexander Binder,et al.  On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation , 2015, PloS one.

[18]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[19]  Artur S. d'Avila Garcez,et al.  Fast relational learning using bottom clause propositionalization with artificial neural networks , 2013, Machine Learning.

[20]  Ashwin Srinivasan,et al.  What Kinds of Relational Features Are Useful for Statistical Learning? , 2012, ILP.

[21]  Ashwin Srinivasan,et al.  Topic Models with Relational Features for Drug Design , 2012, ILP.

[22]  Ashwin Srinivasan,et al.  An investigation into feature construction to assist word sense disambiguation , 2009, Machine Learning.

[23]  Stephen Muggleton,et al.  The lattice structure and refinement operators for the hypothesis space bounded by a bottom clause , 2009, Machine Learning.

[24]  Stephen Muggleton,et al.  Inverse entailment and progol , 1995, New Generation Computing.

[25]  Ashwin Srinivasan,et al.  Feature Construction Using Theory-Guided Sampling and Randomised Search , 2008, ILP.

[26]  Ashwin Srinivasan,et al.  Using ILP to Construct Features for Information Extraction from Semi-structured Text , 2007, ILP.

[27]  Ashwin Srinivasan,et al.  Feature construction with Inductive Logic Programming: A Study of Quantitative Predictions of Biological Activity Aided by Structural Attributes , 1999, Data Mining and Knowledge Discovery.

[28]  Ronald,et al.  Learning representations by backpropagating errors , 2004 .

[29]  Nick Littlestone,et al.  Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm , 2004, Machine Learning.

[30]  Patrick Hoffman,et al.  Data Mining the NCI Cancer Cell Line Compound GI50 Values: Identifying Quinone Subtypes Effective Against Melanoma and Leukemia Cell Classes , 2003, J. Chem. Inf. Comput. Sci..

[31]  Peter A. Flach,et al.  RSD: Relational Subgroup Discovery through First-Order Feature Construction , 2002, ILP.

[32]  Peter A. Flach,et al.  Propositionalization approaches to relational data mining , 2001 .

[33]  Eric McCreath,et al.  Induction in first order logic from noisy training examples and fixed example set sizes , 2000 .

[34]  Allan Pinkus,et al.  Approximation theory of the MLP model in neural networks , 1999, Acta Numerica.

[35]  Arun Sharma,et al.  LIME: A System for Learning Relations , 1998, ALT.

[36]  Shan-Hwei Nienhuys-Cheng,et al.  Foundations of Inductive Logic Programming , 1997, Lecture Notes in Computer Science.

[37]  Lutz Prechelt,et al.  Early Stopping-But When? , 1996, Neural Networks: Tricks of the Trade.

[38]  Luc De Raedt,et al.  Inductive Logic Programming: Theory and Methods , 1994, J. Log. Program..

[39]  Stephen Muggleton,et al.  To the international computing community: A new East-West challenge , 1994 .

[40]  Michael Bain,et al.  Learning Logical Exceptions In Chess , 1994 .

[41]  Ashwin Srinivasan,et al.  Distinguishing Exceptions From Noise in Non-Monotonic Learning , 1992 .

[42]  Nils J. Nilsson,et al.  Logic and Artificial Intelligence , 1991, Artif. Intell..

[43]  George Cybenko,et al.  Approximation by superpositions of a sigmoidal function , 1989, Math. Control. Signals Syst..

[44]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[45]  N. Littlestone Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[46]  Georg Gottlob,et al.  Subsumption and Implication , 1987, Inf. Process. Lett..

[47]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[48]  J. Lloyd Foundations of Logic Programming , 1984, Symbolic Computation.

[49]  Ryszard S. Michalski,et al.  Pattern Recognition as Rule-Guided Inductive Inference , 1980, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[50]  Richard C. T. Lee,et al.  Symbolic logic and mechanical theorem proving , 1973, Computer science classics.

[51]  G. Plotkin Automatic Methods of Inductive Inference , 1972 .

[52]  Saul Amarel,et al.  On representations of problems of reasoning about actions , 1968 .