论文信息 - DeepWiERL: Bringing Deep Reinforcement Learning to the Internet of Self-Adaptive Things

DeepWiERL: Bringing Deep Reinforcement Learning to the Internet of Self-Adaptive Things

Recent work has demonstrated that cutting-edge advances in deep reinforcement learning (DRL) may be leveraged to empower wireless devices with the much-needed ability to "sense" current spectrum and network conditions and "react" in real time by either exploiting known optimal actions or exploring new actions. Yet, understanding whether real-time DRL can be at all applied in the resource-challenged embedded IoT domain, as well as designing IoT-tailored DRL systems and architectures, still remains mostly uncharted territory. This paper bridges the existing gap between the extensive theoretical research on wireless DRL and its system-level applications by presenting Deep Wireless Embedded Reinforcement Learning (DeepWiERL), a general-purpose, hybrid software/hardware DRL framework specifically tailored for embedded IoT wireless devices. DeepWiERL provides abstractions, circuits, software structures and drivers to support the training and real-time execution of state-of-the-art DRL algorithms on the device’s hardware. Moreover, DeepWiERL includes a novel supervised DRL model selection and bootstrap (S-DMSB) technique that leverages transfer learning and high-level synthesis (HLS) circuit design to orchestrate a neural network architecture that satisfies hardware and application throughput constraints and speeds up the DRL algorithm convergence. Experimental evaluation on a fully-custom software-defined radio testbed (i) proves for the first time the feasibility of real-time DRL-based algorithms on a real-world wireless platform with multiple channel conditions; (ii) shows that DeepWiERL supports 16x data rate and consumes 14x less energy than a software-based implementation, and (iii) indicates that S-DMSB may improve the DRL convergence time by 6x and increase the obtained reward by 45% if prior channel knowledge is available.

Tommaso Melodia | Francesco Restuccia

[1] Wenzhong Li,et al. ReLeS: A Neural Adaptive Multipath Scheduler based on Deep Reinforcement Learning , 2019, IEEE INFOCOM 2019 - IEEE Conference on Computer Communications.

[2] Xianfu Chen,et al. Deep Reinforcement Learning for Resource Management in Network Slicing , 2018, IEEE Access.

[3] Kobi Cohen,et al. Deep Multi-User Reinforcement Learning for Distributed Dynamic Spectrum Access , 2017, IEEE Transactions on Wireless Communications.

[4] Long Ji Lin,et al. Self-improving reactive agents based on reinforcement learning, planning and teaching , 1992, Machine Learning.

[5] Haitian Pang,et al. Towards Low Latency Multi-viewpoint 360° Interactive Video: A Multimodal Deep Reinforcement Learning Approach , 2019, IEEE INFOCOM 2019 - IEEE Conference on Computer Communications.

[6] Ying-Chang Liang,et al. Deep Reinforcement Learning-Based Modulation and Coding Scheme Selection in Cognitive Heterogeneous Networks , 2018, IEEE Transactions on Wireless Communications.

[7] Ying-Chang Liang,et al. Applications of Deep Reinforcement Learning in Communications and Networking: A Survey , 2018, IEEE Communications Surveys & Tutorials.

[8] Juan J. Rodríguez-Andina,et al. Performance Characterization and Design Guidelines for Efficient Processor–FPGA Communication in Cyclone V FPSoCs , 2018, IEEE Transactions on Industrial Electronics.

[9] András Lörincz,et al. Learning Tetris Using the Noisy Cross-Entropy Method , 2006, Neural Computation.

[10] Emre Telatar,et al. Capacity and mutual information of wideband multipath fading channels , 1998, IEEE Trans. Inf. Theory.

[11] Tommaso Melodia,et al. Securing the Internet of Things in the Age of Machine Learning and Software-Defined Networking , 2018, IEEE Internet of Things Journal.

[12] Mugen Peng,et al. Deep Reinforcement Learning-Based Mode Selection and Resource Management for Green Fog Radio Access Networks , 2018, IEEE Internet of Things Journal.

[13] Tommaso Melodia,et al. Machine Learning for Wireless Communications in the Internet of Things: A Comprehensive Survey , 2019, Ad Hoc Networks.

[14] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[15] Bhaskar Krishnamachari,et al. Deep Reinforcement Learning for Dynamic Multichannel Access in Wireless Networks , 2018, IEEE Transactions on Cognitive Communications and Networking.

[16] George A. Constantinides,et al. High-level synthesis of dynamic data structures: A case study using Vivado HLS , 2013, 2013 International Conference on Field-Programmable Technology (FPT).

[17] Pieter Abbeel,et al. Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.

[18] Tommaso Melodia,et al. Big Data Goes Small: Real-Time Spectrum-Driven Embedded Wireless Networking Through Deep Learning in the RF Loop , 2019, IEEE INFOCOM 2019 - IEEE Conference on Computer Communications.

[19] Zhiyuan Xu,et al. Experience-Driven Congestion Control: When Multi-Path TCP Meets Deep Reinforcement Learning , 2019, IEEE Journal on Selected Areas in Communications.

[20] Martin A. Riedmiller. Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method , 2005, ECML.

[21] Leemon C. Baird,et al. Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.

[22] Shuguang Cui,et al. Handover Control in Wireless Systems via Asynchronous Multiuser Deep Reinforcement Learning , 2018, IEEE Internet of Things Journal.

[23] Chi Harold Liu,et al. Experience-driven Networking: A Deep Reinforcement Learning based Approach , 2018, IEEE INFOCOM 2018 - IEEE Conference on Computer Communications.

[24] Haitian Pang,et al. Intelligent Edge-Assisted Crowdcast with Deep Reinforcement Learning for Personalized QoE , 2019, IEEE INFOCOM 2019 - IEEE Conference on Computer Communications.

[25] Timothy J. O'Shea,et al. Deep architectures for modulation recognition , 2017, 2017 IEEE International Symposium on Dynamic Spectrum Access Networks (DySPAN).

[26] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[27] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[28] Eduardo F. Morales,et al. An Introduction to Reinforcement Learning , 2011 .

[29] Nan Zhao,et al. Integrated Networking, Caching, and Computing for Connected Vehicles: A Deep Reinforcement Learning Approach , 2018, IEEE Transactions on Vehicular Technology.

[30] T. Charles Clancy,et al. Over-the-Air Deep Learning Based Radio Signal Classification , 2017, IEEE Journal of Selected Topics in Signal Processing.

[31] Geoffrey J. Gordon. Stable Function Approximation in Dynamic Programming , 1995, ICML.

[32] Demis Hassabis,et al. Mastering the game of Go without human knowledge , 2017, Nature.

[33] Qiang Yang,et al. A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[34] Arumugam Nallanathan,et al. Reinforcement Learning for Real-Time Optimization in NB-IoT Networks , 2019, IEEE Journal on Selected Areas in Communications.

[35] Zhisheng Niu,et al. DeepNap: Data-Driven Base Station Sleeping Operations Through Deep Reinforcement Learning , 2018, IEEE Internet of Things Journal.

[36] Soung Chang Liew,et al. Deep-Reinforcement Learning Multiple Access for Heterogeneous Wireless Networks , 2017, 2018 IEEE International Conference on Communications (ICC).

[37] John N. Tsitsiklis,et al. Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.

[38] Wu He,et al. Internet of Things in Industries: A Survey , 2014, IEEE Transactions on Industrial Informatics.

[39] Shiwen Mao,et al. Dealing with Limited Backhaul Capacity in Millimeter-Wave Systems: A Deep Reinforcement Learning Approach , 2018, IEEE Communications Magazine.

[40] Haibo He,et al. Distributive Dynamic Spectrum Access Through Deep Reinforcement Learning: A Reservoir Computing-Based Approach , 2018, IEEE Internet of Things Journal.