Adversarial data poisoning attacks against the PC learning algorithm

ABSTRACT Data integrity is a key component of effective Bayesian network structure learning algorithms, namely PC algorithm, design and use. Given the role that integrity of data plays in these outcomes, this research demonstrates the importance of data integrity as a key component in machine learning tools in order to emphasize the need for carefully considering data integrity during tool development and utilization. To meet this purpose, we study how an adversary could generate a desired network with the PC algorithm. Given a Bayesian network and a database generated by and a second Bayesian network, , which is equal to , except for a minor change like a missing link, a reversed link, or an additional link, we explore and analyze what is the minimal number of changes such as additions, deletions, substitutions to that lead to a database that, when given as input to PC algorithm, results in .

[1]  Paul Barford,et al.  Data Poisoning Attacks against Autoregressive Models , 2016, AAAI.

[2]  J. Doug Tygar,et al.  Evasion and Hardening of Tree Ensemble Classifiers , 2015, ICML.

[3]  Tom Burr,et al.  Causation, Prediction, and Search , 2003, Technometrics.

[4]  Fabio Roli,et al.  Towards Poisoning of Deep Learning Algorithms with Back-gradient Optimization , 2017, AISec@CCS.

[5]  Xiaojin Zhu,et al.  Using Machine Teaching to Identify Optimal Training-Set Attacks on Machine Learners , 2015, AAAI.

[6]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[7]  Pavel Laskov,et al.  Practical Evasion of a Learning-Based Classifier: A Case Study , 2014, 2014 IEEE Symposium on Security and Privacy.

[8]  Fabio Roli,et al.  Evasion Attacks against Machine Learning at Test Time , 2013, ECML/PKDD.

[9]  Steffen L. Lauritzen,et al.  aHUGIN: A System Creating Adaptive Causal Probabilistic Networks , 1992, UAI.

[10]  Blaine Nelson,et al.  Can machine learning be secure? , 2006, ASIACCS '06.

[11]  Richard E. Neapolitan,et al.  Learning Bayesian networks , 2007, KDD '07.

[12]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[13]  Yiran Chen,et al.  Generative Poisoning Attack Method Against Neural Networks , 2017, ArXiv.

[14]  H. Raiffa,et al.  Applied Statistical Decision Theory. , 1961 .

[15]  Brent Lagesse,et al.  Analysis of Causative Attacks against SVMs Learning from Data Streams , 2017, IWSPA@CODASPY.

[16]  Thomas S. Richardson,et al.  Towards Characterizing Markov Equivalence Classes for Directed Acyclic Graphs with Latent Variables , 2005, UAI.

[17]  David J. Spiegelhalter,et al.  Local computations with probabilities on graphical structures and their application to expert systems , 1990 .

[18]  Xiaojin Zhu,et al.  The Security of Latent Dirichlet Allocation , 2015, AISTATS.

[19]  Gries,et al.  The chi-square test for independence , 2022 .

[20]  Imme Ebert-Uphoff,et al.  Tutorial on How to Measure Link Strengths in Discrete Bayesian Networks , 2009 .

[21]  Illtyd Trethowan Causality , 1938 .

[22]  Blaine Nelson,et al.  Poisoning Attacks against Support Vector Machines , 2012, ICML.

[23]  Cristina Nita-Rotaru,et al.  On the Practicality of Integrity Attacks on Document-Level Sentiment Analysis , 2014, AISec '14.

[24]  Howard Raiffa,et al.  Applied Statistical Decision Theory. , 1961 .

[25]  P. Spirtes,et al.  An Algorithm for Fast Recovery of Sparse Causal Graphs , 1991 .

[26]  Finn V. Jensen,et al.  Bayesian Networks and Decision Graphs , 2001, Statistics for Engineering and Information Science.

[27]  Percy Liang,et al.  Understanding Black-box Predictions via Influence Functions , 2017, ICML.

[28]  Claudia Eckert,et al.  Adversarial Label Flips Attack on Support Vector Machines , 2012, ECAI.

[29]  Csilla Farkas,et al.  Cyber Attacks Against the PC Learning Algorithm , 2018, Nemesis/UrbReas/SoGood/IWAISe/GDM@PKDD/ECML.

[30]  Blaine Nelson,et al.  The security of machine learning , 2010, Machine Learning.

[31]  Marco Scutari,et al.  Learning Bayesian Networks with the bnlearn R Package , 2009, 0908.3817.

[32]  Brent Boerlage Link Strength in Bayesian Networks , 1994 .

[33]  Claudia Eckert,et al.  Support vector machines under adversarial label contamination , 2015, Neurocomputing.

[34]  Anders L. Madsen,et al.  The Hugin Tool for Probabilistic Graphical Models , 2005, Int. J. Artif. Intell. Tools.

[35]  Scott M. Lynch,et al.  Introduction to Applied Bayesian Statistics and Estimation for Social Scientists , 2007 .