Data Mining Challenges for Electronic Safety: The Case of Fraudulent Intent Detection in E-Mails

Online criminals have adapted traditional snail mail and door-to-door fraudulent schemes into electronic form. Increasingly, such schemes target an individual’s personal email, where they mingle among, and are masked by, honest communications. The targeting and conniving nature of these schemes are an infringement upon an individual’s personal privacy, as well as a threat to personal safety. In this paper, we introduce an array of challenges which are ripe for the attention of the data mining research community and are vastly different from those of combating the general problem of spam. We illustrate how state-of-theart spam filtering systems fail to capture fraudulent intent hidden in the text of e-mails, but demonstrate how more robust systems can be engineered using existing data mining tools. We conclude by examining a specific scheme, the Nigerian 4-1-9 advance fee fraud scam, for which we design a learning system capable of accurately identifying the fraudulent indent within an e-mail. Our system is applicable to fraud detection and can serve as a guide for law enforcement agencies in cyber-investigations.