A hybrid fraud scoring and spike detection technique in streaming data

The aim has been to propose a fraud detection system with capabilities of minimizing false alarms. In this paper we introduce a technique, which uses a hybrid fraud scoring and spike detection technique in streaming data over time and space. The technique itself differentiates normal, fraud and anomalous links, and increases the suspicion of fraud links with a dynamic global black list. Also, it mitigates the suspicion of normal links with a dynamic global white list. In addition, this technique uses spike detection technique to highlight the sudden and sharp rises in data, which can be indicative of abuse. The purpose is to derive two accurate suspicion scores for all incoming new examples in real-time. Results on mining several thousand credit application data demonstrate that the proposed technique reduces false alarm rates while maintaining a reasonable hit rate. In addition, new insights have been observed from the relationships between examples. The proposed technique takes the advantages of anomaly detection and supervised techniques. However by employing the spike detection technique, the false alarm rate is decreased. By this novel integration of techniques, the proposed technique is able to foil fraudsters' attempts, which continuously morph their styles to avoid to be detected. The results of the experiments to demonstrate the benefits of the technique are also presented in this paper.

[1]  José R. Dorronsoro,et al.  Neural fraud detection in credit card operations , 1997, IEEE Trans. Neural Networks.

[2]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[3]  Dean Abbott,et al.  An evaluation of high-end data mining tools for fraud detection , 1998, SMC'98 Conference Proceedings. 1998 IEEE International Conference on Systems, Man, and Cybernetics (Cat. No.98CH36218).

[4]  Zhi Li,et al.  A Unifying Method for Outlier and Change Detection from Data Streams , 2006 .

[5]  Kate Smith-Miles,et al.  Communal Detection of Implicit Personal Identity Streams , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[6]  Jennifer Widom,et al.  STREAM: the stanford stream data manager (demonstration description) , 2003, SIGMOD '03.

[7]  Corinna Cortes,et al.  Computational Methods for Dynamic Graphs , 2003 .

[8]  Graham J. Williams,et al.  On-Line Unsupervised Outlier Detection Using Finite Mixtures with Discounting Learning Algorithms , 2000, KDD '00.

[9]  J. Stuart Aitken,et al.  Multiple algorithms for fraud detection , 2000, Knowl. Based Syst..

[10]  Foster Provost,et al.  Suspicion scoring of networked entities based on guilt-by-association, collective inference, and focused data access 1 , 2005 .

[11]  Leonid Portnoy,et al.  Intrusion detection with unlabeled data using clustering , 2000 .

[12]  Dean P. Foster,et al.  Variable Selection in Data Mining , 2004 .

[13]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[14]  Divyakant Agrawal,et al.  Using Association Rules for Fraud Detection in Web Advertising Networks , 2005, VLDB.

[15]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[16]  Stanley Wasserman,et al.  Social Network Analysis: Methods and Applications , 1994 .

[17]  Corinna Cortes,et al.  Communities of interest , 2001, Intell. Data Anal..

[18]  Jesus Mena,et al.  Investigative Data Mining for Security and Criminal Detection , 2002 .

[19]  José Edison Cabral,et al.  Fraud detection in electrical energy consumers using rough sets , 2004, 2004 IEEE International Conference on Systems, Man and Cybernetics (IEEE Cat. No.04CH37583).

[20]  Tom Fawcett,et al.  Activity monitoring: noticing interesting changes in behavior , 1999, KDD '99.

[21]  Divyakant Agrawal,et al.  Duplicate detection in click streams , 2005, WWW '05.

[22]  Damminda Alahakoon,et al.  Minority report in fraud detection: classification of skewed data , 2004, SKDD.

[23]  PatchaAnimesh,et al.  An overview of anomaly detection techniques , 2007 .

[24]  Dino Pedreschi,et al.  A classification-based methodology for planning audit strategies in fraud detection , 1999, KDD '99.

[25]  Junliang Chen,et al.  ODDC: Outlier Detection Using Distance Distribution Clustering , 2007, PAKDD Workshops.

[26]  Kate Smith-Miles,et al.  Adaptive Spike Detection for Resilient Data Stream Mining , 2007, AusDM.

[27]  Marcus A. Maloof,et al.  Machine Learning and Data Mining for Computer Security: Methods and Applications (Advanced Information and Knowledge Processing) , 2005 .

[28]  Kate Smith-Miles,et al.  On the communal analysis suspicion scoring for identity crime in streaming credit applications , 2009, Eur. J. Oper. Res..

[29]  Mohammed J. Zaki,et al.  ADMIT: anomaly-based data mining for intrusions , 2002, KDD.

[30]  Lech Polkowski,et al.  Rough Sets in Knowledge Discovery 2 , 1998 .

[31]  Shian-Shyong Tseng,et al.  Two-phase clustering process for outliers detection , 2001, Pattern Recognit. Lett..

[32]  Michael Stonebraker,et al.  Retrospective on Aurora , 2004, The VLDB Journal.

[33]  Bianca Zadrozny,et al.  Outlier detection by active learning , 2006, KDD '06.

[34]  Salvatore J. Stolfo,et al.  Credit Card Fraud Detection Using Meta-Learning: Issues and Initial Results 1 , 1997 .

[35]  Mark I. Hwang,et al.  A fuzzy neural network for assessing the risk of fraudulent financial reporting , 2003 .

[36]  David J. Hand,et al.  Statistical fraud detection: A review , 2002 .

[37]  Ian Witten,et al.  Data Mining , 2000 .

[38]  William W. Cohen Fast Effective Rule Induction , 1995, ICML.

[39]  Kate Smith-Miles,et al.  Temporal Representation in Spike Detection of Sparse Personal Identity Streams , 2006, WISI.

[40]  Chang-Tien Lu,et al.  Survey of fraud detection techniques , 2004, IEEE International Conference on Networking, Sensing and Control, 2004.

[41]  Peter Clark,et al.  The CN2 Induction Algorithm , 1989, Machine Learning.

[42]  Galit Shmueli,et al.  Using grocery sales data for the detection of bio-terrorist attacks , 2002 .

[43]  Mohammad Abdollahi Azgomi,et al.  An Overview of a Hybrid Fraud Scoring and Spike Detection Technique for Fraud Detection in Streaming Data , 2009, ICISTM.

[44]  Philip S. Yu,et al.  Active Mining of Data Streams , 2004, SDM.

[45]  Jung-Min Park,et al.  An overview of anomaly detection techniques: Existing solutions and latest technological trends , 2007, Comput. Networks.

[46]  Marcus A. Maloof MACHINE LEARNING AND DATA MINING FOR COMPUTER SECURITY: METHODS AND APPLICATIONS , 2011 .

[47]  Andrew W. Moore,et al.  Data mining for early disease outbreak detection , 2004 .

[48]  Mohammad Abdollahi Azgomi,et al.  A Taxonomy of Frauds and Fraud Detection Techniques , 2009, ICISTM.

[49]  Yizhak Idan,et al.  Discovery of fraud rules for telecommunications—challenges and solutions , 1999, KDD '99.

[50]  Richard E. Overill,et al.  Design of an artificial immune system as a novel anomaly detector for combating financial fraud in the retail sector , 2003, The 2003 Congress on Evolutionary Computation, 2003. CEC '03..

[51]  Kate Smith-Miles,et al.  Adaptive communal detection in search of adversarial identity crime , 2007, DDDM '07.

[52]  Salvatore J. Stolfo,et al.  Distributed data mining in credit card fraud detection , 1999, IEEE Intell. Syst..