Many classification tasks, such as spam filtering, intrusion detection, and terrorism detection, are complicated by an adversary who wishes to avoid detection. Previous work on adversarial classification has made the unrealistic assumption that the attacker has perfect knowledge of the classifier [2]. In this paper, we introduce the adversarial classifier reverse engineering (ACRE) learning problem, the task of learning sufficient information about a classifier to construct adversarial attacks. We present efficient algorithms for reverse engineering linear classifiers with either continuous or Boolean features and demonstrate their effectiveness using real data from the domain of spam filtering.
[1]
Leslie G. Valiant,et al.
A theory of the learnable
,
1984,
STOC '84.
[2]
Dana Angluin,et al.
Queries and concept learning
,
1988,
Machine Learning.
[3]
Susan T. Dumais,et al.
A Bayesian Approach to Filtering Junk E-Mail
,
1998,
AAAI 1998.
[4]
Le Zhang,et al.
Filtering Junk Mail with a Maximum Entropy Model
,
2003
.
[5]
Pedro M. Domingos,et al.
Adversarial classification
,
2004,
KDD.
[6]
Christopher Meek,et al.
Good Word Attacks on Statistical Spam Filters
,
2005,
CEAS.