Classification for Fraud Detection with Social Network Analysis

Worldwide fraud conducts to big losses to states’ treasuries and to private companies. Because of that, motivations to detect and fight fraud are high, but despite continuous efforts, it is far from being accomplished. The problems faced when trying to characterize fraud activities are many, with the specificities of fraud on each business domain leading the list. Despite the differences, building a classifier for fraud detection almost always requires to deal with unbalanced datasets, since fraudulent records are usually in a small number when compared with the nonfraudulent ones. This work describes two types of techniques to deal with fraud detection: techniques at a preprocessing level where the goal is to balance the dataset, and techniques at a processing level where the objective is to apply different errors costs to fraudulent and non-fraudulent cases. Besides that, as organizations and people more often do associations in order to commit fraud, is proposed a new method to make use of that information to improve the training of classifiers for fraud detection. In particular, this new method identifies patterns among the social networks for fraudulent organizations, and uses them to enrich the description of its entity. The enriched data will then be used jointly with balancing techniques to produce a better classifier to identify fraud.

[1]  AlahakoonDamminda,et al.  Minority report in fraud detection , 2004 .

[2]  Pablo M. Granitto,et al.  REPMAC: A New Hybrid Approach to Highly Imbalanced Classification Problems , 2008, 2008 Eighth International Conference on Hybrid Intelligent Systems.

[3]  Lawrence O. Hall Data mining from extreme data sets: very large and/or very skewed data sets , 2001, 2001 IEEE International Conference on Systems, Man and Cybernetics. e-Systems and e-Man for Cybernetics in Cyberspace (Cat.No.01CH37236).

[4]  Chang-Tien Lu,et al.  Survey of fraud detection techniques , 2004, IEEE International Conference on Networking, Sensing and Control, 2004.

[5]  Nitesh V. Chawla,et al.  Editorial: special issue on learning from imbalanced data sets , 2004, SKDD.

[6]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[7]  Damminda Alahakoon,et al.  Minority report in fraud detection: classification of skewed data , 2004, SKDD.

[8]  Stanley Milgram,et al.  An Experimental Study of the Small World Problem , 1969 .

[9]  Hongxing He,et al.  Outlier Detection Using Replicator Neural Networks , 2002, DaWaK.

[10]  Sangjin Lee,et al.  Design and Implementation of a Tool to Detect Accounting Frauds , 2008, 2008 International Conference on Intelligent Information Hiding and Multimedia Signal Processing.

[11]  Aihua Shen,et al.  Application of Classification Models on Credit Card Fraud Detection , 2007, 2007 International Conference on Service Systems and Service Management.

[12]  J.G.Y. Radcliffe The insurance industry's use of databases to prevent and detect fraud, and improve recoveries , 1995 .

[13]  Jesfis Peral,et al.  Heuristics -- intelligent search strategies for computer problem solving , 1984 .

[14]  Xindong Wu,et al.  10 Challenging Problems in Data Mining Research , 2006, Int. J. Inf. Technol. Decis. Mak..

[15]  Yonggwan Won,et al.  Classification of Unbalanced Medical Data with Weighted Regularized Least Squares , 2007, 2007 Frontiers in the Convergence of Bioscience and Information Technologies.

[16]  Qinming He,et al.  An Unbalanced Dataset Classification Approach Based on v-Support Vector Machine , 2006, 2006 6th World Congress on Intelligent Control and Automation.

[17]  Jau-Hwang Wang,et al.  Technology-based Financial Frauds in Taiwan: Issues and Approaches , 2006, 2006 IEEE International Conference on Systems, Man and Cybernetics.

[18]  J. N. Liu,et al.  2005 International Conference on Wireless Communications, Networking and Mobile Computing , 2005 .

[19]  R. Biswas,et al.  Metagraph-Based Substructure Pattern Mining , 2008, 2008 International Conference on Advanced Computer Theory and Engineering.

[20]  Hui Han,et al.  Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning , 2005, ICIC.

[21]  Gustavo E. A. P. A. Batista,et al.  A study of the behavior of several methods for balancing machine learning training data , 2004, SKDD.

[22]  Rüdiger W. Brause,et al.  Neural data mining for credit card fraud detection , 1999, Proceedings 11th International Conference on Tools with Artificial Intelligence.

[23]  José R. Dorronsoro,et al.  Neural fraud detection in credit card operations , 1997, IEEE Trans. Neural Networks.

[24]  M. Weatherford,et al.  Mining for fraud , 2002 .

[25]  Volker Tresp,et al.  Fraud detection in communication networks using neural and probabilistic methods , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[26]  Tao Guo,et al.  Neural data mining for credit card fraud detection , 2008, 2008 International Conference on Machine Learning and Cybernetics.

[27]  Judea Pearl,et al.  Heuristics : intelligent search strategies for computer problem solving , 1984 .

[28]  GuoHongyu,et al.  Learning from imbalanced data sets with boosting and data generation , 2004 .

[29]  Gary M. Weiss Mining with rarity: a unifying framework , 2004, SKDD.

[30]  Akira Maeda,et al.  Unsupervised Outlier Detection in Time Series Data , 2006, 22nd International Conference on Data Engineering Workshops (ICDEW'06).

[31]  Andrew P. Bradley,et al.  The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..

[32]  Vipin Kumar,et al.  Evaluating boosting algorithms to classify rare classes: comparison and improvements , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[33]  Diane Lambert,et al.  Detecting fraud in the real world , 2002 .

[34]  Mantao Xu,et al.  Classification of Imbalanced Data by Using the SMOTE Algorithm and Locally Linear Embedding , 2006, 2006 8th international Conference on Signal Processing.

[35]  Jiawei Han,et al.  gSpan: graph-based substructure pattern mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[36]  Tom Chen,et al.  Design and implementation , 2006, IEEE Commun. Mag..

[37]  Chao-Hsien Chu,et al.  A Review of Data Mining-Based Financial Fraud Detection Research , 2007, 2007 International Conference on Wireless Communications, Networking and Mobile Computing.

[38]  Mohsen Jamali,et al.  Different Aspects of Social Network Analysis , 2006, 2006 IEEE/WIC/ACM International Conference on Web Intelligence (WI 2006 Main Conference Proceedings)(WI'06).

[39]  Wenyuan Wang,et al.  An Over-sampling Expert System for Learing from Imbalanced Data Sets , 2005, 2005 International Conference on Neural Networks and Brain.

[40]  Vishnuprasad Nagadevara,et al.  Development of Hybrid Classification Methodology for Mining Skewed Data Sets - A Case Study of Indian Customs Data , 2006, IEEE International Conference on Computer Systems and Applications, 2006..

[41]  Sotiris Kotsiantis,et al.  Forecasting Fraudulent Financial Statements using Data Mining , 2007 .

[42]  Georgios C. Anagnostopoulos,et al.  A Scalable and Efficient Outlier Detection Strategy for Categorical Data , 2007 .