Discrimination prevention in data mining for intrusion and crime detection

Automated data collection has fostered the use of data mining for intrusion and crime detection. Indeed, banks, large corporations, insurance companies, casinos, etc. are increasingly mining data about their customers or employees in view of detecting potential intrusion, fraud or even crime. Mining algorithms are trained from datasets which may be biased in what regards gender, race, religion or other attributes. Furthermore, mining is often outsourced or carried out in cooperation by several entities. For those reasons, discrimination concerns arise. Potential intrusion, fraud or crime should be inferred from objective misbehavior, rather than from sensitive attributes like gender, race or religion. This paper discusses how to clean training datasets and outsourced datasets in such a way that legitimate classification rules can still be extracted but discriminating rules based on sensitive attributes cannot.