Online modeling of proactive moderation system for auction fraud detection

We consider the problem of building online machine-learned models for detecting auction frauds in e-commence web sites. Since the emergence of the world wide web, online shopping and online auction have gained more and more popularity. While people are enjoying the benefits from online trading, criminals are also taking advantages to conduct fraudulent activities against honest parties to obtain illegal profit. Hence proactive fraud-detection moderation systems are commonly applied in practice to detect and prevent such illegal and fraud activities. Machine-learned models, especially those that are learned online, are able to catch frauds more efficiently and quickly than human-tuned rule-based systems. In this paper, we propose an online probit model framework which takes online feature selection, coefficient bounds from human knowledge and multiple instance learning into account simultaneously. By empirical experiments on a real-world online auction fraud detection data we show that this model can potentially detect more frauds and significantly reduce customer complaints compared to several baseline models and the human-tuned rule-based system.

[1]  C. I. Bliss THE CALCULATION OF THE DOSAGE-MORTALITY CURVE , 1935 .

[2]  D. Anderson,et al.  Algorithms for minimization without derivatives , 1974 .

[3]  Mario Bertero,et al.  The Stability of Inverse Problems , 1980 .

[4]  Sylvia Richardson,et al.  Stochastic search variable selection , 1995 .

[5]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[6]  Michael I. Jordan,et al.  A Variational Approach to Bayesian Logistic Regression Models and their Extensions , 1997, AISTATS.

[7]  Jorge Nocedal,et al.  Algorithm 778: L-BFGS-B: Fortran subroutines for large-scale bound-constrained optimization , 1997, TOMS.

[8]  Thomas G. Dietterich,et al.  Solving the Multiple Instance Problem with Axis-Parallel Rectangles , 1997, Artif. Intell..

[9]  Sylvia Richardson,et al.  Markov Chain Monte Carlo in Practice , 1997 .

[10]  Tomás Lozano-Pérez,et al.  A Framework for Multiple-Instance Learning , 1997, NIPS.

[11]  Michael A. West,et al.  Bayesian Forecasting and Dynamic Models (2nd edn) , 1997, J. Oper. Res. Soc..

[12]  Allan Borodin,et al.  Online computation and competitive analysis , 1998 .

[13]  Oded Maron,et al.  Multiple-Instance Learning for Natural Scene Classification , 1998, ICML.

[14]  Wasserman,et al.  Bayesian Model Selection and Model Averaging. , 2000, Journal of mathematical psychology.

[15]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[16]  Paul Resnick,et al.  Reputation systems , 2000, CACM.

[17]  Thomas Hofmann,et al.  Support Vector Machines for Multiple-Instance Learning , 2002, NIPS.

[18]  Eric R. Ziegel,et al.  Generalized Linear Models , 2002, Technometrics.

[19]  J. Friedman Stochastic gradient boosting , 2002 .

[20]  Paul Resnick,et al.  The value of reputation on eBay: A controlled experiment , 2002 .

[21]  Kyoung-jae Kim,et al.  Financial time series forecasting using support vector machines , 2003, Neurocomputing.

[22]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[23]  Cecil Eng Huang Chua,et al.  Fighting Internet auction fraud: an assessment and proposal , 2004, Computer.

[24]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[25]  J. S. Rao,et al.  Spike and slab variable selection: Frequentist and Bayesian strategies , 2005, math/0505633.

[26]  Duen Horng Chau,et al.  Fraud Detection in Electronic Auction , 2005 .

[27]  Yanxi Liu,et al.  Online selection of discriminative tracking features , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Qionghai Dai,et al.  Similarity-based online feature selection in content-based image retrieval , 2006, IEEE Transactions on Image Processing.

[29]  Judy E. Scott,et al.  The Role of Reputation Systems in Reducing On-Line Auction Fraud , 2006, Int. J. Electron. Commer..

[30]  Chaochang Chiu,et al.  A Proposed Data Mining Approach for Internet Auction Fraud Detection , 2007, PAISI.

[31]  Art B. Owen,et al.  Infinitely Imbalanced Logistic Regression , 2007, J. Mach. Learn. Res..

[32]  Christos Faloutsos,et al.  Netprobe: a fast and scalable system for fraud detection in online auction networks , 2007, WWW '07.

[33]  M. West,et al.  Shotgun Stochastic Search for “Large p” Regression , 2007 .

[34]  Murat Dundar,et al.  Bayesian multiple instance learning: automatic feature selection and inductive transfer , 2008, ICML '08.

[35]  Deepak Agarwal,et al.  Spatio-temporal models for estimating click-through rate , 2009, WWW '09.

[36]  H. Chipman,et al.  BART: Bayesian Additive Regression Trees , 2008, 0806.3286.

[37]  Wei Chu,et al.  A machine-learned proactive moderation system for auction fraud detection , 2011, CIKM '11.

[38]  R. Tibshirani,et al.  Regression shrinkage and selection via the lasso: a retrospective , 2011 .

[39]  Lihong Li,et al.  Unbiased online active learning in data streams , 2011, KDD.