Impression Allocation for Combating Fraud in E-commerce Via Deep Reinforcement Learning with Action Norm Penalty

Conducting fraud transactions has become popular among e-commerce sellers to make their products favorable to the platform and buyers, which decreases the utilization efficiency of buyer impressions and jeopardizes the business environment. Fraud detection techniques are necessary but not enough for the platform since it is impossible to recognize all the fraud transactions. In this paper, we focus on improving the platform’s impression allocation mechanism to maximize its profit and reduce the sellers’ fraudulent behaviors simultaneously. First, we learn a seller behavior model to predict the sellers’ fraudulent behaviors from the real-world data provided by one of the largest ecommerce company in the world. Then, we formulate the platform’s impression allocation problem as a continuous Markov Decision Process (MDP) with unbounded action space. In order to make the action executable in practice and facilitate learning, we propose a novel deep reinforcement learning algorithm DDPG-ANP that introduces an action norm penalty to the reward function. Experimental results show that our algorithm significantly outperforms existing baselines in terms of scalability and solution quality.

[1]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[2]  R. Bucklin,et al.  Modeling Purchase Behavior at an E-Commerce Web Site: A Task-Completion Approach , 2004 .

[3]  Hyun Ah Song,et al.  FRAUDAR: Bounding Graph Fraud in the Face of Camouflage , 2016, KDD.

[4]  Bo An,et al.  Data Poisoning Attacks on Multi-Task Relationship Learning , 2018, AAAI.

[5]  Dirk Van den Poel,et al.  Predicting online-purchasing behaviour , 2005, Eur. J. Oper. Res..

[6]  Ee-Peng Lim,et al.  Detecting product review spammers using rating behaviors , 2010, CIKM.

[7]  Jie Zhang,et al.  Towards Collusive Fraud Detection in Online Reviews , 2015, 2015 IEEE International Conference on Data Mining.

[8]  Sebastian Scherer,et al.  Improving Stochastic Policy Gradients in Continuous Control with Deep Reinforcement Learning using the Beta Distribution , 2017, ICML.

[9]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[10]  Chih-Jen Lin,et al.  Field-aware Factorization Machines for CTR Prediction , 2016, RecSys.

[11]  Yan Hong,et al.  Reinforcement Mechanism Design, with Applications to Dynamic Pricing in Sponsored Search Auctions , 2017, ArXiv.

[12]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[13]  Zhao Li,et al.  Fraud Transaction Recognition: A Money Flow Network Approach , 2015, CIKM.

[14]  Pingzhong Tang,et al.  Reinforcement mechanism design , 2017, IJCAI.

[15]  Tom Schaul,et al.  Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.

[16]  Yiwei Zhang,et al.  Reinforcement Mechanism Design for Fraudulent Behaviour in e-Commerce , 2018, AAAI.

[17]  Yiwei Zhang,et al.  Reinforcement Mechanism Design for e-commerce , 2017, WWW.

[18]  Guy Lever,et al.  Deterministic Policy Gradient Algorithms , 2014, ICML.

[19]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[20]  Bo An,et al.  Efficient Label Contamination Attacks Against Black-Box Learning Models , 2017, IJCAI.

[21]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[22]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..