论文信息 - Foundations of Adversarial Machine Learning

Foundations of Adversarial Machine Learning

As classifiers are deployed to detect malicious behavior ran ging from spam to terrorism, adversaries modify their behaviors to avoid detection (e.g., [4, 3, 6]). This makes th very behavior the classifier is trying to detect a function of the classifier itself. Learners that account for concept d rif (e.g., [5]) are not sufficient since they do not allow the change in concept to depend on the classifier. As a result, hum ans ust adapt the classifier with each new attack. Ideally, we would like to see classifiers that are resistant t o attack and that respond to successful attacks automatical ly. In this abstract, we argue that the development of such class ifier requires new frameworks combining machine learning and game theory, taking into account the utilities and costs of both the classification system and its adversary. We have recently developed such a framework that allow s us to identify weaknesses in classification systems, predict how an adversary could exploit them, and even deploy reemptive defenses against these exploits. Although theoretically motivated, these methods achieve excellent empirical results in realistic email spam filtering domains . In general, we assume that the goal of the adversary is to evad e detection while minimizing cost. Consider the task of an email spammer. The goal is to get an email message past a s pam filter, and the cost comes from modifying the message by adding or removing words. These changes may make t he spam more likely to pass through the filter, but they may also make for a less effective sales pitch. In credit card fraud, less desired fraudulent purchases may be less likely to be flagged as suspicious. Terrorists may attempt to disguise their activities to avoid detection, but it makes their operations more expensive. Even search engine optimi zation can be seen as an attempt to gain a higher ranking with minimal web page modifications and infrastructure inve stment. The advantage of a general framework is that it can be applied to a wide variety of important real-world prob lems. In Dalvi et al. [2], we investigate automatically adjusting a classifier by predicting the adversary’s behavior in advance. We model utility and cost functions for both the cla ssifier and the adversary and compute optimal strategies for a sequential game. First, the classifier learns a cost-se nsitive classification function on presumably untainted data. The adversary, who is assumed to have full knowledge of this unction, modifies malicious examples to make them appear innocent while minimizing its own cost. Finally , the classifier, assumed to have full knowledge of the adversary’s utility, adjusts its classification strategy b y testing to see if innocent-looking instances could actual ly be optimally modified versions of malicious instances. This sequence can be repeated any number of times as the adversary and classifier iteratively respond to each other. We evaluated our methods by taking publicly available email d tabases and running our adversary and classification algorithms with different utility settings and cost models. Against every attack, the adversary-aware classifi er vastly outperformed an adversary-ignorant baseline. It re mains to be seen if these methods can be effective in the real world, where information is much more limited and the sp ace of posible moves by the adversary is not known in advance. In Lowd and Meek [7], we look at a similar scenario but from the perspective of an adversary with limited information. As in Dalvi et al. [2], we assume that the adversar y wishes to have instances (e.g., emails) misclassified with minimal cost (e.g., the number of added or removed words ). However, instead of assuming complete knowledge, the adversary is allowed a polynomial number of membership q ueries to test what labels the classifier would assign to manufactured instances. For the case of spam, this can be d one by seeing if test messages reach an email account protected by the spam filter. From these queries, the adversa ry must find an innocently-labeled instance whose cost

Christopher Meek | Daniel Lowd | Pedro Domingos

[1] Tom Fawcett,et al. "In vivo" spam filtering: a challenge problem for KDD , 2003, SKDD.

[2] Leslie G. Valiant,et al. A theory of the learnable , 1984, CACM.

[3] Christopher Meek,et al. Adversarial learning , 2005, KDD '05.

[4] Dana Angluin,et al. Queries and concept learning , 1988, Machine Learning.

[5] Tom Fawcett,et al. Adaptive Fraud Detection , 1997, Data Mining and Knowledge Discovery.

[6] David D. Jensen,et al. Information awareness: a prospective technical assessment , 2003, KDD '03.

[7] Geoff Hulten,et al. Mining time-changing data streams , 2001, KDD '01.

[8] Pedro M. Domingos,et al. Adversarial classification , 2004, KDD.

[9] Christopher Meek,et al. Good Word Attacks on Statistical Spam Filters , 2005, CEAS.