Auxiliary-task Based Deep Reinforcement Learning for Participant Selection Problem in Mobile Crowdsourcing

In mobile crowdsourcing (MCS), the platform selects participants to complete location-aware tasks from the recruiters aiming to achieve multiple goals (e.g., profit maximization, energy efficiency, and fairness). However, different MCS systems have different goals and there are possibly conflicting goals even in one MCS system. Therefore, it is crucial to design a participant selection algorithm that applies to different MCS systems to achieve multiple goals. To deal with this issue, we formulate the participant selection problem as a reinforcement learning problem and propose to solve it with a novel method, which we call auxiliary-task based deep reinforcement learning (ADRL). We use transformers to extract representations from the context of the MCS system and a pointer network to deal with the combinatorial optimization problem. To improve the sample efficiency, we adopt an auxiliary-task training process that trains the network to predict the imminent tasks from the recruiters, which facilitates the embedding learning of the deep learning model. Additionally, we release a simulated environment on a specific MCS task, the ride-sharing task, and conduct extensive performance evaluations in this environment. The experimental results demonstrate that ADRL outperforms and improves sample efficiency over other well-recognized baselines in various settings.

[1]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[2]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[3]  Bo Li,et al.  Fair energy-efficient sensing task allocation in participatory sensing with smartphones , 2014, IEEE INFOCOM 2014 - IEEE Conference on Computer Communications.

[4]  Daqing Zhang,et al.  CrowdRecruiter: selecting participants for piggyback crowdsensing under probabilistic coverage constraint , 2014, UbiComp.

[5]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[6]  Beng Chin Ooi,et al.  CrowdOp: Query Optimization for Declarative Crowdsourcing Systems , 2015, IEEE Transactions on Knowledge and Data Engineering.

[7]  Navdeep Jaitly,et al.  Pointer Networks , 2015, NIPS.

[8]  Chunyan Miao,et al.  Efficient Collaborative Crowdsourcing , 2016, AAAI.

[9]  Lei Chen,et al.  Online mobile Micro-Task Allocation in spatial crowdsourcing , 2016, 2016 IEEE 32nd International Conference on Data Engineering (ICDE).

[10]  Xu Chen,et al.  Crowdlet: Optimal worker recruitment for self-organized mobile crowdsourcing , 2016, IEEE INFOCOM 2016 - The 35th Annual IEEE International Conference on Computer Communications.

[11]  Hoong Chuin Lau,et al.  Collective Multiagent Sequential Decision Making Under Uncertainty , 2017, AAAI.

[12]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[13]  Mark Harman,et al.  A survey of the use of crowdsourcing in software engineering , 2017, J. Syst. Softw..

[14]  Thomas Bonald,et al.  A Minimax Optimal Algorithm for Crowdsourcing , 2016, NIPS.

[15]  Tom Schaul,et al.  StarCraft II: A New Challenge for Reinforcement Learning , 2017, ArXiv.

[16]  Jieping Ye,et al.  A Taxi Order Dispatch Model based On Combinatorial Optimization , 2017, KDD.

[17]  Dario Pompili,et al.  Argus: Smartphone-enabled human cooperation for disaster situational awareness via MARL , 2017, 2017 IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom Workshops).

[18]  Georgios Portokalidis,et al.  Techu: Open and Privacy-Preserving Crowdsourced GPS for the Masses , 2017, MobiSys.

[19]  Tom Schaul,et al.  Reinforcement Learning with Unsupervised Auxiliary Tasks , 2016, ICLR.

[20]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[21]  Zhe Xu,et al.  Efficient Large-Scale Fleet Management via Multi-Agent Deep Reinforcement Learning , 2018, KDD.

[22]  Zheng Wang,et al.  Multi-task Representation Learning for Travel Time Estimation , 2018, KDD.

[23]  Razvan Pascanu,et al.  Relational Deep Reinforcement Learning , 2018, ArXiv.

[24]  Jieping Ye,et al.  Deep Multi-View Spatial-Temporal Network for Taxi Demand Prediction , 2018, AAAI.

[25]  Aleksander Madry,et al.  How Does Batch Normalization Help Optimization? (No, It Is Not About Internal Covariate Shift) , 2018, NeurIPS.

[26]  Zhe Xu,et al.  Large-Scale Order Dispatch in On-Demand Ride-Hailing Platforms: A Learning and Planning Approach , 2018, KDD.

[27]  Zheng Wang,et al.  Learning to Estimate the Travel Time , 2018, KDD.

[28]  Julian Togelius,et al.  AlphaStar: an evolutionary computation perspective , 2019, GECCO.

[29]  Krishna P. Gummadi,et al.  Two-Sided Fairness for Repeated Matchings in Two-Sided Markets: A Case Study of a Ride-Hailing Platform , 2019, KDD.

[30]  Richard L. Lewis,et al.  Discovery of Useful Questions as Auxiliary Tasks , 2019, NeurIPS.

[31]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[32]  Fan Zhang,et al.  Deep Reinforcement Learning for Ride-sharing Dispatching and Repositioning , 2019, IJCAI.

[33]  Ruben Villegas,et al.  Learning Latent Dynamics for Planning from Pixels , 2018, ICML.

[34]  Jimmy Ba,et al.  Dream to Control: Learning Behaviors by Latent Imagination , 2019, ICLR.

[35]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[36]  Pieter Abbeel,et al.  CURL: Contrastive Unsupervised Representations for Reinforcement Learning , 2020, ICML.

[37]  Stephan Schmidt,et al.  Emissions from the Taxi and For-Hire Vehicle Transportation Sector in New York City , 2020 .

[38]  Daniel Guo,et al.  Bootstrap Latent-Predictive Representations for Multitask Reinforcement Learning , 2020, ICML.

[39]  Marc G. Bellemare,et al.  The Value-Improvement Path: Towards Better Representations for Reinforcement Learning , 2020, AAAI.