Classification of Imbalanced Auction Fraud Data

Online auctioning has attracted serious fraud given the huge amount of money involved and anonymity of users. In the auction fraud detection domain, the class imbalance, which means less fraud instances are present in bidding transactions, negatively impacts the classification performance because the latter is biased towards the majority class i.e. normal bidding behavior. The best-designed approach to handle the imbalanced learning problem is data sampling that was found to improve the classification efficiency. In this study, we utilize a hybrid method of data over-sampling and under-sampling to be more effective in addressing the issue of highly imbalanced auction fraud datasets. We deploy a set of well-known binary classifiers to understand how the class imbalance affects the classification results. We choose the most relevant performance metrics to deal with both imbalanced data and fraud bidding data.

[1]  Tom Fawcett,et al.  Robust Classification for Imprecise Environments , 2000, Machine Learning.

[2]  Wen-Hsi Chang,et al.  A novel two-stage phased modeling framework for early fraud detection in online auctions , 2011, Expert Syst. Appl..

[3]  Gary M. Weiss,et al.  Cost-Sensitive Learning vs. Sampling: Which is Best for Handling Unbalanced Classes with Unequal Error Costs? , 2007, DMIN.

[4]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[5]  Samira Sadaoui,et al.  A dynamic stage-based fraud monitoring framework of multiple live auctions , 2016, Applied Intelligence.

[6]  Samira Sadaoui,et al.  An Empirical Analysis of Imbalanced Data Classification , 2015, Comput. Inf. Sci..

[7]  Haibo He,et al.  Learning from Imbalanced Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[8]  Darlene Bay,et al.  Shill bidding: Empirical evidence of its effectiveness and likelihood of detection in online auction systems , 2015, Int. J. Account. Inf. Syst..

[9]  Stephen Kwek,et al.  Applying Support Vector Machines to Imbalanced Datasets , 2004, ECML.

[10]  Le Hoang Son,et al.  Some novel hybrid forecast methods based on picture fuzzy clustering for weather nowcasting from satellite image sequences , 2016, Applied Intelligence.

[11]  Longin Jan Latecki,et al.  Improving SVM Classification on Imbalanced Data Sets in Distance Spaces , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[12]  Iren Valova,et al.  A Real-Time Self-Adaptive Classifier for Identifying Suspicious Bidders in Online Auctions , 2013, Comput. J..