论文信息 - Data Mining Challenges for Electronic Safety: The Case of Fraudulent Intent Detection in E-Mails

Data Mining Challenges for Electronic Safety: The Case of Fraudulent Intent Detection in E-Mails

Online criminals have adapted traditional snail mail and door-to-door fraudulent schemes into electronic form. Increasingly, such schemes target an individual’s personal email, where they mingle among, and are masked by, honest communications. The targeting and conniving nature of these schemes are an infringement upon an individual’s personal privacy, as well as a threat to personal safety. In this paper, we introduce an array of challenges which are ripe for the attention of the data mining research community and are vastly different from those of combating the general problem of spam. We illustrate how state-of-theart spam filtering systems fail to capture fraudulent intent hidden in the text of e-mails, but demonstrate how more robust systems can be engineered using existing data mining tools. We conclude by examining a specific scheme, the Nigerian 4-1-9 advance fee fraud scam, for which we design a learning system capable of accurately identifying the fraudulent indent within an e-mail. Our system is applicable to fraud detection and can serve as a guide for law enforcement agencies in cyber-investigations.

E. Airoldi | B. Malin

[1] K. Shadan,et al. Available online: , 2012 .

[2] Jean Dickinson Gibbons,et al. Nonparametric Statistical Inference. 2nd Edition. , 1986 .

[3] Daniel R. Shiman. When e-mail becomes junk mail: The welfare implications of the advancement of communications technology , 1996 .

[4] David Jensen,et al. Prospective Assessment of AI Technologies for Fraud Detection: A Case Study , 1997 .

[5] Michael L. Littman,et al. Unsupervised Learning of Semantic Orientation from a Hundred-Billion-Word Corpus , 2002, ArXiv.

[6] David M. Pennock,et al. Mining the peanut gallery: opinion extraction and semantic classification of product reviews , 2003, WWW '03.

[7] Henry Lieberman,et al. A model of textual affect sensing using real-world knowledge , 2003, IUI '03.

[8] William W. Cohen,et al. Bayesian Models for Frequent Terms in Text , 2004 .

[9] Stephen E. Fienberg,et al. Bayesian Mixed Membership Models for Soft Clustering and Classification , 2004, GfKl.

[10] Jay F. Nunamaker,et al. An exploratory study on promising cues in deception detection and application of decision tree , 2004, 37th Annual Hawaii International Conference on System Sciences, 2004. Proceedings of the.

[11] William W. Cohen,et al. Statistical Models for Frequent Terms in Text , 2004 .

[12] Tom M. Mitchell,et al. Learning to Classify Email into “Speech Acts” , 2004, EMNLP.