Defending Regression Learners Against Poisoning Attacks

Regression models, which are widely used from engineering applications to financial forecasting, are vulnerable to targeted malicious attacks such as training data poisoning, through which adversaries can manipulate their predictions. Previous works that attempt to address this problem rely on assumptions about the nature of the attack/attacker or overestimate the knowledge of the learner, making them impractical. We introduce a novel Local Intrinsic Dimensionality (LID) based measure called N-LID that measures the local deviation of a given data point's LID with respect to its neighbors. We then show that N-LID can distinguish poisoned samples from normal samples and propose an N-LID based defense approach that makes no assumptions of the attacker. Through extensive numerical experiments with benchmark datasets, we show that the proposed defense mechanism outperforms the state of the art defenses in terms of prediction accuracy (up to 76% lower MSE compared to an undefended ridge model) and running time.

[1]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[2]  P. Sen Estimates of the Regression Coefficient Based on Kendall's Tau , 1968 .

[3]  Michael E. Houle,et al.  Local Intrinsic Dimensionality II: Multivariate Analysis and Distributional Support , 2017, SISAP.

[4]  Fabio Roli,et al.  Security Evaluation of Pattern Classifiers under Attack , 2014, ArXiv.

[5]  David R. Musicant,et al.  Robust Linear and Support Vector Regression , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Klemens Böhm,et al.  Towards Concise Models of Grid Stability , 2018, 2018 IEEE International Conference on Communications, Control, and Computing Technologies for Smart Grids (SmartGridComm).

[7]  David W. Aha,et al.  Instance‐based prediction of real‐valued attributes , 1989, Comput. Intell..

[8]  Paul Barford,et al.  Data Poisoning Attacks against Autoregressive Models , 2016, AAAI.

[9]  James Bailey,et al.  Dimensionality-Driven Learning with Noisy Labels , 2018, ICML.

[10]  H. Koh,et al.  Data mining applications in healthcare. , 2005, Journal of healthcare information management : JHIM.

[11]  Michael E. Houle,et al.  Local Intrinsic Dimensionality I: An Extreme-Value-Theoretic Foundation for Similarity Applications , 2017, SISAP.

[12]  Shie Mannor,et al.  Outlier-Robust PCA: The High-Dimensional Case , 2013, IEEE Transactions on Information Theory.

[13]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD '00.

[14]  Wei Cai,et al.  A Survey on Security Threats and Defensive Techniques of Machine Learning: A Data Driven View , 2018, IEEE Access.

[15]  Shie Mannor,et al.  Robust Logistic Regression and Classification , 2014, NIPS.

[16]  RadhaKanta Mahapatra,et al.  Business data mining - a machine learning perspective , 2001, Inf. Manag..

[17]  Liang Tong,et al.  Adversarial Regression with Multiple Learners , 2018, ICML.

[18]  Mohammed Bennamoun,et al.  Linear Regression for Face Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Fabio Roli,et al.  Wild Patterns: Ten Years After the Rise of Adversarial Machine Learning , 2018, CCS.

[20]  Claudia Eckert,et al.  Is Feature Selection Secure against Training Data Poisoning? , 2015, ICML.

[21]  H. Theil A Rank-Invariant Method of Linear and Polynomial Regression Analysis , 1992 .

[22]  András Varga,et al.  An overview of the OMNeT++ simulation environment , 2008, SimuTools.

[23]  Blaine Nelson,et al.  Exploiting Machine Learning to Subvert Your Spam Filter , 2008, LEET.

[24]  Chang Liu,et al.  Robust Linear Regression Against Training Data Poisoning , 2017, AISec@CCS.

[25]  Peter J. Rousseeuw,et al.  Robust regression and outlier detection , 1987 .

[26]  Christopher Leckie,et al.  Detection of Anomalous Communications with SDRs and Unsupervised Adversarial Learning , 2018, 2018 IEEE 43rd Conference on Local Computer Networks (LCN).

[27]  Chang Liu,et al.  Manipulating Machine Learning: Poisoning Attacks and Countermeasures for Regression Learning , 2018, 2018 IEEE Symposium on Security and Privacy (SP).

[28]  Arun Kumar Sangaiah,et al.  Smart grid load forecasting using online support vector regression , 2017, Comput. Electr. Eng..

[29]  Jie Yang,et al.  Adversarial Attack Type I: Cheat Classifiers by Significant Changes , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Christopher Leckie,et al.  Support vector machines resilient against training data integrity attacks , 2019, Pattern Recognit..

[31]  Fernando De la Torre,et al.  Robust Regression , 2016, IEEE Trans. Pattern Anal. Mach. Intell..

[32]  Frederick R. Forst,et al.  On robust estimation of the location parameter , 1980 .

[33]  James Bailey,et al.  Characterizing Adversarial Subspaces Using Local Intrinsic Dimensionality , 2018, ICLR.

[34]  Houshang Darabi,et al.  Adversarial Attacks on Time Series , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Ken-ichi Kawarabayashi,et al.  Estimating Local Intrinsic Dimensionality , 2015, KDD.

[36]  Shie Mannor,et al.  Robust Sparse Regression under Adversarial Corruption , 2013, ICML.