Explaining non-linear Classifier Decisions within Kernel-based Deep Architectures

Nonlinear methods such as deep neural networks achieve state-of-the-art performances in several semantic NLP tasks. However epistemologically transparent decisions are not provided as for the limited interpretability of the underlying acquired neural models. In neural-based semantic inference tasks epistemological transparency corresponds to the ability of tracing back causal connections between the linguistic properties of a input instance and the produced classification output. In this paper, we propose the use of a methodology, called Layerwise Relevance Propagation, over linguistically motivated neural architectures, namely Kernel-based Deep Architectures (KDA), to guide argumentations and explanation inferences. In such a way, each decision provided by a KDA can be linked to real examples, linguistically related to the input instance: these can be used to motivate the network output. Quantitative analysis shows that richer explanations about the semantic and syntagmatic structures of the examples characterize more convincing arguments in two tasks, i.e. question classification and semantic role labeling.

[1]  I. Bratko,et al.  Information-based evaluation criterion for classifier's performance , 2004, Machine Learning.

[2]  Ameet Talwalkar,et al.  Sampling Methods for the Nyström Method , 2012, J. Mach. Learn. Res..

[3]  Aaron C. Courville,et al.  Understanding Representations Learned in Deep Architectures , 2010 .

[4]  Roberto Basili,et al.  HuRIC: a Human Robot Interaction Corpus , 2014, LREC.

[5]  Motoaki Kawanabe,et al.  How to Explain Individual Classification Decisions , 2009, J. Mach. Learn. Res..

[6]  Alexander Binder,et al.  On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation , 2015, PloS one.

[7]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2004 .

[8]  Charles J. Fillmore,et al.  Frames and the semantics of understanding , 1985 .

[9]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[10]  Roberto Basili,et al.  KELP: a Kernel-based Learning Platform , 2018, J. Mach. Learn. Res..

[11]  Geoffrey E. Hinton,et al.  Learning to combine foveal glimpses with a third-order Boltzmann machine , 2010, NIPS.

[12]  Roberto Basili,et al.  Structured Lexical Similarity via Convolution Kernels on Dependency Trees , 2011, EMNLP.

[13]  Michael Collins,et al.  New Ranking Algorithms for Parsing and Tagging: Kernels over Discrete Structures, and the Voted Perceptron , 2002, ACL.

[14]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[15]  Geoffrey E. Hinton,et al.  Distilling a Neural Network Into a Soft Decision Tree , 2017, CEx@AI*IA.

[16]  Roberto Basili,et al.  Deep Learning in Semantic Kernel Spaces , 2017, ACL.

[17]  Dan Roth,et al.  Learning question classifiers: the role of semantic information , 2005, Natural Language Engineering.

[18]  Andrew Zisserman,et al.  Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[19]  Roberto Basili,et al.  Tree Kernels for Semantic Role Labeling , 2008, CL.

[20]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[21]  Ivan Titov,et al.  Semantic Role Labeling , 2010, HLT-NAACL.

[22]  Roberto Basili,et al.  Semantic Compositionality in Tree Kernels , 2014, CIKM.