论文信息 - xRAI: Explainable Representations through AI - 字舞流文

xRAI: Explainable Representations through AI

We present xRAI an approach for extracting symbolic representations of the mathematical function a neural network was supposed to learn from the trained network. The approach is based on the idea of training a so-called interpretation network that receives the weights and biases of the trained network as input and outputs the numerical representation of the function the network was supposed to learn that can be directly translated into a symbolic representation. We show that interpretation nets for different classes of functions can be trained on synthetic data offline using Boolean functions and low-order polynomials as examples. We show that the training is rather efficient and the quality of the results are promising. Our work aims to provide a contribution to the problem of better understanding neural decision making by making the target function explicit

Heiner Stuckenschmidt | Christiann Bartelt | Sascha Marton | H. Stuckenschmidt | Christian Bartelt | Sascha Marton

[1] Alexander I. Barvinok. Integration and Optimization of Multivariate Polynomials by Restriction onto a Random Subspace , 2007, Found. Comput. Math..

[2] Arieh Iserles,et al. On the Foundations of Computational Mathematics , 2001 .

[3] Joydeep Ghosh,et al. Symbolic Interpretation of Artificial Neural Networks , 1999, IEEE Trans. Knowl. Data Eng..

[4] Yoshua Bengio,et al. Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[5] Zhi-Hua Zhou,et al. Extracting symbolic rules from trained neural network ensembles , 2003, AI Commun..

[6] Bart Baesens,et al. Recursive Neural Network Rule Extraction for Data With Mixed Attributes , 2008, IEEE Transactions on Neural Networks.

[7] Eneldo Loza Mencía,et al. DeepRED - Rule Extraction from Deep Neural Networks , 2016, DS.

[8] Franco Turini,et al. A Survey of Methods for Explaining Black Box Models , 2018, ACM Comput. Surv..

[9] G. G. Wang,et al. Metamodeling for High Dimensional Simulation-Based Design Problems , 2010 .

[10] Matthew D. Zeiler. ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[11] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[12] Brandon M. Greenwell,et al. Interpretable Machine Learning , 2019, Hands-On Machine Learning with R.

[13] Peter Tino,et al. IEEE Transactions on Neural Networks , 2009 .

[14] Yan Liu,et al. Neural Interaction Transparency (NIT): Disentangling Learned Interactions for Improved Interpretability , 2018, NeurIPS.

[15] Songqing Shan,et al. Turning Black-Box Functions Into White Functions , 2011 .

[16] Zachary Chase Lipton. The mythos of model interpretability , 2016, ACM Queue.

[17] Krysia Broda,et al. Symbolic knowledge extraction from trained neural networks: A sound approach , 2001, Artif. Intell..

[18] Klaus-Robert Müller,et al. Layer-Wise Relevance Propagation: An Overview , 2019, Explainable AI.

[19] Bin Fu. Multivariate Polynomial Integration and Differentiation Are Polynomial Time Inapproximable Unless P=NP , 2012, FAW-AAIM.

[20] Michael I. Jordan,et al. Advances in Neural Information Processing Systems 30 , 1995 .

[21] Benjamin W. Wah,et al. Editorial: Two Named to Editorial Board of IEEE Transactions on Knowledge and Data Engineering , 1996 .

[22] Jaime S. Cardoso,et al. Machine Learning Interpretability: A Survey on Methods and Metrics , 2019, Electronics.

[23] Carlos Guestrin,et al. "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[24] Alexandr Andoni,et al. Learning Polynomials with Neural Networks , 2014, ICML.