Shapley Homology: Topological Analysis of Sample Influence for Neural Networks

Data samples collected for training machine learning models are typically assumed to be independent and identically distributed (i.i.d.). Recent research has demonstrated that this assumption can be problematic as it simplifies the manifold of structured data. This has motivated different research areas such as data poisoning, model improvement, and explanation of machine learning models. In this work, we study the influence of a sample on determining the intrinsic topological features of its underlying manifold. We propose the Shapley homology framework, which provides a quantitative metric for the influence of a sample of the homology of a simplicial complex. Our proposed framework consists of two main parts: homology analysis, where we compute the Betti number of the target topological space, and Shapley value calculation, where we decompose the topological features of a complex built from data points to individual points. By interpreting the influence as a probability measure, we further define an entropy that reflects the complexity of the data manifold. Furthermore, we provide a preliminary discussion of the connection of the Shapley homology to the Vapnik-Chervonenkis dimension. Empirical studies show that when the zero-dimensional Shapley homology is used on neighboring graphs, samples with higher influence scores have a greater impact on the accuracy of neural networks that determine graph connectivity and on several regular grammars whose higher entropy values imply greater difficulty in being learned.

[1]  Le Song,et al.  Adversarial Attack on Graph Structured Data , 2018, ICML.

[2]  Chang D. Yoo,et al.  Learning to Augment Influential Data , 2018 .

[3]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[4]  Vin de Silva,et al.  On the Local Behavior of Spaces of Natural Images , 2007, International Journal of Computer Vision.

[5]  Pradeep Ravikumar,et al.  Representer Point Selection for Explaining Deep Neural Networks , 2018, NeurIPS.

[6]  Vladimir Cherkassky,et al.  The Nature Of Statistical Learning Theory , 1997, IEEE Trans. Neural Networks.

[7]  Rushil Anirudh,et al.  MARGIN: Uncovering Deep Neural Networks using Graph Signal Analysis , 2017 .

[8]  H. Edelsbrunner,et al.  Persistent Homology — a Survey , 2022 .

[9]  Steve Hanneke,et al.  The Optimal Sample Complexity of PAC Learning , 2015, J. Mach. Learn. Res..

[10]  S. Mukherjee,et al.  Persistent Homology Transform for Modeling Shapes and Surfaces , 2013, 1310.1030.

[11]  Colin de la Higuera,et al.  Grammatical Inference: Learning Automata and Grammars , 2010 .

[12]  Herbert Edelsbrunner,et al.  Computational Topology - an Introduction , 2009 .

[13]  Le Song,et al.  Discriminative Embeddings of Latent Variable Models for Structured Data , 2016, ICML.

[14]  R. Ho Algebraic Topology , 2022 .

[15]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory, Second Edition , 2000, Statistics for Engineering and Information Science.

[16]  Yoshiyasu Ishigami,et al.  VC-dimensions of Finite Automata and Commutative Finite Automata with k Letters and n States , 1997, Discret. Appl. Math..

[17]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[18]  Le Song,et al.  L-Shapley and C-Shapley: Efficient Model Interpretation for Structured Data , 2018, ICLR.

[19]  Gunnar E. Carlsson,et al.  Topological Approaches to Deep Learning , 2018, Topological Data Analysis.

[20]  Eran Yahav,et al.  Extracting Automata from Recurrent Neural Networks Using Queries and Counterexamples , 2017, ICML.

[21]  J. S. Mateo The Shapley Value , 2012 .

[22]  Yizhen Wang,et al.  Data Poisoning Attacks against Online Learning , 2018, ArXiv.

[23]  Alexei A. Efros,et al.  Dataset Distillation , 2018, ArXiv.

[24]  Andreas Uhl,et al.  Deep Learning with Topological Signatures , 2017, NIPS.

[25]  Xue Liu,et al.  Verification of Recurrent Neural Networks Through Rule Extraction , 2018, ArXiv.

[26]  David Cohen-Steiner,et al.  Approximating the Spectrum of a Graph , 2017, KDD.

[27]  Xue Liu,et al.  An Empirical Evaluation of Rule Extraction from Recurrent Neural Networks , 2017, Neural Computation.

[28]  ANNE MARSDEN,et al.  EIGENVALUES OF THE LAPLACIAN AND THEIR RELATIONSHIP TO THE CONNECTEDNESS , 2013 .

[29]  Dylan Hadfield-Menell,et al.  On the Geometry of Adversarial Examples , 2018, ArXiv.

[30]  Yair Zick,et al.  Algorithmic Transparency via Quantitative Input Influence: Theory and Experiments with Learning Systems , 2016, 2016 IEEE Symposium on Security and Privacy (SP).

[31]  Dawn Xiaodong Song,et al.  Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning , 2017, ArXiv.

[32]  Xue Liu,et al.  A Comparative Study of Rule Extraction for Recurrent Neural Networks , 2018, 1801.05420.

[33]  R. Kennedy,et al.  Defense Advanced Research Projects Agency (DARPA). Change 1 , 1996 .

[34]  Yoshua Bengio,et al.  On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[35]  Maks Ovsjanikov,et al.  Persistence-Based Structural Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[36]  Frédéric Chazal,et al.  An Introduction to Topological Data Analysis: Fundamental and Practical Aspects for Data Scientists , 2017, Frontiers in Artificial Intelligence.

[37]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[38]  Scott Lundberg,et al.  A Unified Approach to Interpreting Model Predictions , 2017, NIPS.

[39]  Mark E. J. Newman A measure of betweenness centrality based on random walks , 2005, Soc. Networks.

[40]  Bin Yang,et al.  Learning to Reweight Examples for Robust Deep Learning , 2018, ICML.

[41]  Peter Bubenik,et al.  Statistical topological data analysis using persistence landscapes , 2012, J. Mach. Learn. Res..

[42]  Pierre Schapira,et al.  Categories and Homological Algebra , 2011 .

[43]  Percy Liang,et al.  Understanding Black-box Predictions via Influence Functions , 2017, ICML.

[44]  Seyed Saeed Changiz Rezaei Entropy and Graphs , 2013, 1311.5632.