Fraud Regulating Policy for E-Commerce via Constrained Contextual Bandits

Fraud sellers in e-commerce often promote themselves via fake visits or purchases to increase sales, jeopardizing the business environment of the platform. How to regulate the exposure of these sellers to buyers without affecting normal online business remains a challenging problem, since blocking them entirely without discrimination may kill the normal transactions and could potentially decrease the total transactions of the platform. To address this problem, we introduce a regulating valve which blocks fraud sellers with a certain probability. To learn the optimal blocking policy, we model the regulating valve as a contextual bandit problem with a constraint on the total transaction decline. Since existing bandit algorithms are unable to incorporate the transaction constraint, we propose a novel bandit algorithm, which decides the policy based on a set of neural networks and iteratively updates the neural networks with online observations and the constraint. Experiments on synthetic data and one of the largest e-commerce platforms in the world both show that our algorithm effectively and efficiently outperforms existing bandit algorithms by a large margin.

[1]  Csaba Szepesvári,et al.  Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.

[2]  Pingzhong Tang,et al.  Ranking Mechanism Design for Price-setting Agents in E-commerce , 2018, AAMAS.

[3]  Alexander J. Smola,et al.  Parallelized Stochastic Gradient Descent , 2010, NIPS.

[4]  Qiang Wu,et al.  McRank: Learning to Rank Using Multiple Classification and Gradient Boosting , 2007, NIPS.

[5]  R. Srikant,et al.  Algorithms with Logarithmic or Sublinear Regret for Constrained Contextual Bandits , 2015, NIPS.

[6]  Bo An,et al.  Impression Allocation for Combating Fraud in E-commerce Via Deep Reinforcement Learning with Action Norm Penalty , 2018, IJCAI.

[7]  Yiwei Zhang,et al.  Reinforcement Mechanism Design for e-commerce , 2017, WWW.

[8]  Zhao Li,et al.  Detecting and Characterizing Web Bot Traffic in a Large E-commerce Marketplace , 2018, ESORICS.

[9]  Luo Si,et al.  Cascade Ranking for Operational E-commerce Search , 2017, KDD.

[10]  Zhao Li,et al.  Fraud Transaction Recognition: A Money Flow Network Approach , 2015, CIKM.

[11]  Andreas Krause,et al.  Contextual Gaussian Process Bandit Optimization , 2011, NIPS.

[12]  Pingzhong Tang,et al.  Reinforcement mechanism design , 2017, IJCAI.

[13]  Wei Chu,et al.  A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.

[14]  Ashish Kapoor,et al.  Safety-Aware Algorithms for Adversarial Contextual Bandit , 2017, ICML.

[15]  Csaba Szepesvári,et al.  Online Learning to Rank in Stochastic Click Models , 2017, ICML.

[16]  Yiwei Zhang,et al.  Reinforcement Mechanism Design for Fraudulent Behaviour in e-Commerce , 2018, AAAI.

[17]  Wei Chu,et al.  Contextual Bandits with Linear Payoff Functions , 2011, AISTATS.

[18]  Shipra Agrawal,et al.  Thompson Sampling for Contextual Bandits with Linear Payoffs , 2012, ICML.

[19]  Qing Wang,et al.  Online Context-Aware Recommendation with Time Varying Multi-Armed Bandit , 2016, KDD.

[20]  Nikhil R. Devanur,et al.  An efficient algorithm for contextual bandits with knapsacks, and an extension to concave objectives , 2015, COLT.

[21]  Ron Kohavi,et al.  Online Controlled Experiments and A/B Testing , 2017, Encyclopedia of Machine Learning and Data Mining.

[22]  Fuzhen Zhuang,et al.  Policy Gradients for Contextual Bandits , 2018, ArXiv.

[23]  M. J. Fryer,et al.  Simulation and the Monte Carlo method , 1981, Wiley series in probability and mathematical statistics.

[24]  Yujing Hu,et al.  Reinforcement Learning to Rank in E-Commerce Search Engine: Formalization, Analysis, and Application , 2018, KDD.

[25]  Aurélien Garivier,et al.  Parametric Bandits: The Generalized Linear Case , 2010, NIPS.

[26]  Andreas Krause,et al.  Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting , 2009, IEEE Transactions on Information Theory.

[27]  Pieter Abbeel,et al.  Constrained Policy Optimization , 2017, ICML.

[28]  Gregory N. Hullender,et al.  Learning to rank using gradient descent , 2005, ICML.

[29]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[30]  Yiqun Liu,et al.  Detecting Crowdturfing "Add to Favorites" Activities in Online Shopping , 2018, WWW.

[31]  Benjamin Van Roy,et al.  Ensemble Sampling , 2017, NIPS.

[32]  Xiang Li,et al.  Perceive Your Users in Depth: Learning Universal User Representations from Multiple E-commerce Tasks , 2018, KDD.