Correlation-aware Cooperative Multigroup Broadcast 360{\deg} Video Delivery Network: A Hierarchical Deep Reinforcement Learning Approach

With the stringent requirement of receiving video from unmanned aerial vehicle (UAV) from anywhere in the stadium of sports events and the significant-high per-cell throughput for video transmission to virtual reality (VR) users, a promising solution is a cell-free multi-group broadcast (CF-MB) network with cooperative reception and broadcast access points (AP). To explore the benefit of broadcasting user-correlated decode-dependent video resources to spatially correlated VR users, the network should dynamically schedule the video and cluster APs into virtual cells for a different group of VR users with overlapped video requests. By decomposition the problem into scheduling and association sub-problems, we first introduce the conventional non-learning-based scheduling and association algorithms, and a centralized deep reinforcement learning (DRL) association approach based on the rainbow agent with a convolutional neural network (CNN) to generate decisions from observation. To reduce its complexity, we then decompose the association problem into multiple sub-problems, resulting in a networked-distributed Partially Observable Markov decision process (ND-POMDP). To solve it, we propose a multi-agent deep DRL algorithm. To jointly solve the coupled association and scheduling problems, we further develop a hierarchical federated DRL algorithm with scheduler as meta-controller, and association as the controller. Our simulation results shown that our CF-MB network can effectively handle real-time video transmission from UAVs to VR users. Our proposed learning architectures is effective and scalable for a high-dimensional cooperative association problem with increasing APs and VR users. Also, our proposed algorithms outperform non-learning based methods with significant performance improvement.

[1]  Erik G. Larsson,et al.  Cell-Free Massive MIMO Versus Small Cells , 2016, IEEE Transactions on Wireless Communications.

[2]  Walid Saad,et al.  Mobile Unmanned Aerial Vehicles (UAVs) for Energy-Efficient Internet of Things Communications , 2017, IEEE Transactions on Wireless Communications.

[3]  Mehdi Bennis,et al.  Communication-Efficient Massive UAV Online Path Control: Federated Learning Meets Mean-Field Game Theory , 2020, IEEE Transactions on Communications.

[4]  Hamid Aghvami,et al.  Cellular-Connected Wireless Virtual Reality: Requirements, Challenges, and Solutions , 2020, IEEE Communications Magazine.

[5]  Yumei Wang,et al.  A Flexible Viewport-Adaptive Processing Mechanism for Real-Time VR Video Transmission , 2019, 2019 IEEE International Conference on Multimedia & Expo Workshops (ICMEW).

[6]  H. Vincent Poor,et al.  Convergence Time Optimization for Federated Learning Over Wireless Networks , 2020, IEEE Transactions on Wireless Communications.

[7]  Sujit Dey,et al.  Head and Body Motion Prediction to Enable Mobile VR Experiences with Low Latency , 2019, 2019 IEEE Global Communications Conference (GLOBECOM).

[8]  John F. Canny,et al.  Measuring the Reliability of Reinforcement Learning Algorithms , 2019, ICLR.

[9]  Emil Björnson,et al.  Dynamic Resource Allocation in Co-Located and Cell-Free Massive MIMO , 2019, IEEE Transactions on Green Communications and Networking.

[10]  Klara Nahrstedt,et al.  Scalable 360° Video Stream Delivery: Challenges, Solutions, and Opportunities , 2019, Proceedings of the IEEE.

[11]  Shane Legg,et al.  Noisy Networks for Exploration , 2017, ICLR.

[12]  Walid Saad,et al.  Deep Learning for 360° Content Transmission in UAV-Enabled Virtual Reality , 2019, ICC 2019 - 2019 IEEE International Conference on Communications (ICC).

[13]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[14]  Emil Björnson,et al.  Intelligent Reflecting Surface Versus Decode-and-Forward: How Large Surfaces are Needed to Beat Relaying? , 2019, IEEE Wireless Communications Letters.

[15]  Walid Saad,et al.  Data Correlation-Aware Resource Management in Wireless Virtual Reality (VR): An Echo State Transfer Learning Approach , 2019, IEEE Transactions on Communications.

[16]  Wei Cui,et al.  Spatial Deep Learning for Wireless Scheduling , 2018, 2018 IEEE Global Communications Conference (GLOBECOM).

[17]  Abdulmotaleb El-Saddik,et al.  Edge Caching and Computing in 5G for Mobile AR/VR and Tactile Internet , 2019, IEEE MultiMedia.

[18]  Marc G. Bellemare,et al.  A Distributional Perspective on Reinforcement Learning , 2017, ICML.

[19]  Makoto Yokoo,et al.  Networked Distributed POMDPs: A Synergy of Distributed Constraint Optimization and POMDPs , 2005, IJCAI.

[20]  Michael I. Jordan,et al.  Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems , 1994, NIPS.

[21]  Sumei Sun,et al.  Multicast Linear Precoding for MIMO-OFDM Systems , 2015, IEEE Communications Letters.

[22]  Stefano Buzzi,et al.  User-Centric Cell-Free Massive MIMO with Interference Cancellation and Local ZF Downlink Precoding , 2018, 2018 15th International Symposium on Wireless Communication Systems (ISWCS).

[23]  Xiaoming Tao,et al.  Viewport Proposal CNN for 360° Video Quality Assessment , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Qiang Li,et al.  Multipath Cooperative Communications Networks for Augmented and Virtual Reality Transmission , 2017, IEEE Transactions on Multimedia.

[25]  Nurul H. Mahmood,et al.  5G Centralized Multi-Cell Scheduling for URLLC: Algorithms and System-Level Performance , 2018, IEEE Access.

[26]  Jonathan P. How,et al.  Deep Decentralized Multi-task Multi-Agent Reinforcement Learning under Partial Observability , 2017, ICML.

[27]  Bart De Schutter,et al.  Multi-agent Reinforcement Learning: An Overview , 2010 .

[28]  Ming Zhou,et al.  Mean Field Multi-Agent Reinforcement Learning , 2018, ICML.

[29]  N. K. Shankaranarayanan,et al.  Exploiting Mobility in Proportional Fair Cellular Scheduling: Measurements and Algorithms , 2014, IEEE/ACM Transactions on Networking.

[30]  Harpreet S. Dhillon,et al.  Poisson cluster process: Bridging the gap between PPP and 3GPP HetNet models , 2017, 2017 Information Theory and Applications Workshop (ITA).

[31]  Walid Saad,et al.  Unmanned Aerial Vehicle With Underlaid Device-to-Device Communications: Performance and Tradeoffs , 2015, IEEE Transactions on Wireless Communications.

[32]  Tom Schaul,et al.  Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.

[33]  Stefano Buzzi,et al.  Cell-Free Massive MIMO: User-Centric Approach , 2017, IEEE Wireless Communications Letters.

[34]  Joshua B. Tenenbaum,et al.  Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation , 2016, NIPS.

[35]  Yong Zhao,et al.  Communication-Constrained Mobile Edge Computing Systems for Wireless Virtual Reality: Scheduling and Tradeoff , 2018, IEEE Access.

[36]  Erik G. Larsson,et al.  Massive MIMO Performance—TDD Versus FDD: What Do Measurements Say? , 2017, IEEE Transactions on Wireless Communications.

[37]  Mehdi Bennis,et al.  Taming the Latency in Multi-User VR 360°: A QoE-Aware Deep Learning-Aided Multicast Framework , 2018, IEEE Transactions on Communications.

[38]  Jeffrey G. Andrews,et al.  Multi-Antenna Communication in Ad Hoc Networks: Achieving MIMO Gains with SIMO Transmission , 2008, IEEE Transactions on Communications.