Continuous Incentive Mechanism for D2D Content sharing: A Deep Reinforcement Learning Approach

Device-to-device (D2D) communication-based content sharing is regarded as a promising way to offload traffic from cellular networks, where incentive mechanisms are required to motivate mobile user equipment (UE) to participate in content sharing. In this paper, we firstly propose an improved scoring mechanism to provide continuous incentive and then study the impact of historical behavior on continuous motivation. Furthermore, to maintain continuous motivation while keeping the service quality of content sharing, we investigate the weights setting of historical behavior and current status in scores calculating, which is formulated as a stochastic dynamic programming (SDP) problem due to the long-term performance and the randomness of the network. To tackle the curse of dimensionality, a deep reinforcement learning (DRL) algorithm is employed for optimization. Simulation results show that with DRL, the mechanism is effective in motivating content-sharing continuously, improving the quality of service (QoS), and cutting down the sharing cost as well.

[1]  Qing Wang,et al.  A Survey on Device-to-Device Communication in Cellular Networks , 2013, IEEE Communications Surveys & Tutorials.

[2]  Yueming Cai,et al.  Collaborative Caching and Matching for D2D Content Sharing , 2018, IEEE Wireless Communications.

[3]  Baochun Li,et al.  Maximized Cellular Traffic Offloading via Device-to-Device Content Sharing , 2016, IEEE Journal on Selected Areas in Communications.

[4]  Kai Hwang,et al.  PowerTrust: A Robust and Scalable Reputation System for Trusted Peer-to-Peer Computing , 2007, IEEE Transactions on Parallel and Distributed Systems.

[5]  Zhao Qianchuan,et al.  Advances in Assignment Problem and comparison of algorithms , 2008, 2008 27th Chinese Control Conference.

[6]  Yueming Cai,et al.  Social-Aware Rate Based Content Sharing Mode Selection for D2D Content Sharing Scenarios , 2017, IEEE Transactions on Multimedia.

[7]  Jan Telgen,et al.  Stochastic Dynamic Programming , 2016 .

[8]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[9]  Lin Ma,et al.  Maximized Traffic Offloading by Content Sharing in D2D Communication , 2017, 2017 IEEE 86th Vehicular Technology Conference (VTC-Fall).

[10]  Vijay K. Bhargava,et al.  Relay Selection for OFDM Wireless Systems under Asymmetric Information: A Contract-Theory Based Approach , 2012, IEEE Transactions on Wireless Communications.

[11]  Shahid Mumtaz,et al.  Computation Resource Allocation and Task Assignment Optimization in Vehicular Fog Computing: A Contract-Matching Approach , 2019, IEEE Transactions on Vehicular Technology.

[12]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[13]  Yuguang Fang,et al.  A Fine-Grained Reputation System for Reliable Service Selection in Peer-to-Peer Networks , 2007, IEEE Transactions on Parallel and Distributed Systems.

[14]  Tao Zhang,et al.  Incentive Mechanism and Content Provider Selection for Device-to-Device-Based Content Sharing , 2019, IEEE Transactions on Vehicular Technology.

[15]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[16]  Walid Saad,et al.  Contract-Based Incentive Mechanisms for Device-to-Device Communications in Cellular Networks , 2015, IEEE Journal on Selected Areas in Communications.