DeepWiERL: Bringing Deep Reinforcement Learning to the Internet of Self-Adaptive Things

Recent work has demonstrated that cutting-edge advances in deep reinforcement learning (DRL) may be leveraged to empower wireless devices with the much-needed ability to "sense" current spectrum and network conditions and "react" in real time by either exploiting known optimal actions or exploring new actions. Yet, understanding whether real-time DRL can be at all applied in the resource-challenged embedded IoT domain, as well as designing IoT-tailored DRL systems and architectures, still remains mostly uncharted territory. This paper bridges the existing gap between the extensive theoretical research on wireless DRL and its system-level applications by presenting Deep Wireless Embedded Reinforcement Learning (DeepWiERL), a general-purpose, hybrid software/hardware DRL framework specifically tailored for embedded IoT wireless devices. DeepWiERL provides abstractions, circuits, software structures and drivers to support the training and real-time execution of state-of-the-art DRL algorithms on the device’s hardware. Moreover, DeepWiERL includes a novel supervised DRL model selection and bootstrap (S-DMSB) technique that leverages transfer learning and high-level synthesis (HLS) circuit design to orchestrate a neural network architecture that satisfies hardware and application throughput constraints and speeds up the DRL algorithm convergence. Experimental evaluation on a fully-custom software-defined radio testbed (i) proves for the first time the feasibility of real-time DRL-based algorithms on a real-world wireless platform with multiple channel conditions; (ii) shows that DeepWiERL supports 16x data rate and consumes 14x less energy than a software-based implementation, and (iii) indicates that S-DMSB may improve the DRL convergence time by 6x and increase the obtained reward by 45% if prior channel knowledge is available.

[1]  Wenzhong Li,et al.  ReLeS: A Neural Adaptive Multipath Scheduler based on Deep Reinforcement Learning , 2019, IEEE INFOCOM 2019 - IEEE Conference on Computer Communications.

[2]  Xianfu Chen,et al.  Deep Reinforcement Learning for Resource Management in Network Slicing , 2018, IEEE Access.

[3]  Kobi Cohen,et al.  Deep Multi-User Reinforcement Learning for Distributed Dynamic Spectrum Access , 2017, IEEE Transactions on Wireless Communications.

[4]  Long Ji Lin,et al.  Self-improving reactive agents based on reinforcement learning, planning and teaching , 1992, Machine Learning.

[5]  Haitian Pang,et al.  Towards Low Latency Multi-viewpoint 360° Interactive Video: A Multimodal Deep Reinforcement Learning Approach , 2019, IEEE INFOCOM 2019 - IEEE Conference on Computer Communications.

[6]  Ying-Chang Liang,et al.  Deep Reinforcement Learning-Based Modulation and Coding Scheme Selection in Cognitive Heterogeneous Networks , 2018, IEEE Transactions on Wireless Communications.

[7]  Ying-Chang Liang,et al.  Applications of Deep Reinforcement Learning in Communications and Networking: A Survey , 2018, IEEE Communications Surveys & Tutorials.

[8]  Juan J. Rodríguez-Andina,et al.  Performance Characterization and Design Guidelines for Efficient Processor–FPGA Communication in Cyclone V FPSoCs , 2018, IEEE Transactions on Industrial Electronics.

[9]  András Lörincz,et al.  Learning Tetris Using the Noisy Cross-Entropy Method , 2006, Neural Computation.

[10]  Emre Telatar,et al.  Capacity and mutual information of wideband multipath fading channels , 1998, IEEE Trans. Inf. Theory.

[11]  Tommaso Melodia,et al.  Securing the Internet of Things in the Age of Machine Learning and Software-Defined Networking , 2018, IEEE Internet of Things Journal.

[12]  Mugen Peng,et al.  Deep Reinforcement Learning-Based Mode Selection and Resource Management for Green Fog Radio Access Networks , 2018, IEEE Internet of Things Journal.

[13]  Tommaso Melodia,et al.  Machine Learning for Wireless Communications in the Internet of Things: A Comprehensive Survey , 2019, Ad Hoc Networks.

[14]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[15]  Bhaskar Krishnamachari,et al.  Deep Reinforcement Learning for Dynamic Multichannel Access in Wireless Networks , 2018, IEEE Transactions on Cognitive Communications and Networking.

[16]  George A. Constantinides,et al.  High-level synthesis of dynamic data structures: A case study using Vivado HLS , 2013, 2013 International Conference on Field-Programmable Technology (FPT).

[17]  Pieter Abbeel,et al.  Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.

[18]  Tommaso Melodia,et al.  Big Data Goes Small: Real-Time Spectrum-Driven Embedded Wireless Networking Through Deep Learning in the RF Loop , 2019, IEEE INFOCOM 2019 - IEEE Conference on Computer Communications.

[19]  Zhiyuan Xu,et al.  Experience-Driven Congestion Control: When Multi-Path TCP Meets Deep Reinforcement Learning , 2019, IEEE Journal on Selected Areas in Communications.

[20]  Martin A. Riedmiller Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method , 2005, ECML.

[21]  Leemon C. Baird,et al.  Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.

[22]  Shuguang Cui,et al.  Handover Control in Wireless Systems via Asynchronous Multiuser Deep Reinforcement Learning , 2018, IEEE Internet of Things Journal.

[23]  Chi Harold Liu,et al.  Experience-driven Networking: A Deep Reinforcement Learning based Approach , 2018, IEEE INFOCOM 2018 - IEEE Conference on Computer Communications.

[24]  Haitian Pang,et al.  Intelligent Edge-Assisted Crowdcast with Deep Reinforcement Learning for Personalized QoE , 2019, IEEE INFOCOM 2019 - IEEE Conference on Computer Communications.

[25]  Timothy J. O'Shea,et al.  Deep architectures for modulation recognition , 2017, 2017 IEEE International Symposium on Dynamic Spectrum Access Networks (DySPAN).

[26]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[27]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[28]  Eduardo F. Morales,et al.  An Introduction to Reinforcement Learning , 2011 .

[29]  Nan Zhao,et al.  Integrated Networking, Caching, and Computing for Connected Vehicles: A Deep Reinforcement Learning Approach , 2018, IEEE Transactions on Vehicular Technology.

[30]  T. Charles Clancy,et al.  Over-the-Air Deep Learning Based Radio Signal Classification , 2017, IEEE Journal of Selected Topics in Signal Processing.

[31]  Geoffrey J. Gordon Stable Function Approximation in Dynamic Programming , 1995, ICML.

[32]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[33]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[34]  Arumugam Nallanathan,et al.  Reinforcement Learning for Real-Time Optimization in NB-IoT Networks , 2019, IEEE Journal on Selected Areas in Communications.

[35]  Zhisheng Niu,et al.  DeepNap: Data-Driven Base Station Sleeping Operations Through Deep Reinforcement Learning , 2018, IEEE Internet of Things Journal.

[36]  Soung Chang Liew,et al.  Deep-Reinforcement Learning Multiple Access for Heterogeneous Wireless Networks , 2017, 2018 IEEE International Conference on Communications (ICC).

[37]  John N. Tsitsiklis,et al.  Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.

[38]  Wu He,et al.  Internet of Things in Industries: A Survey , 2014, IEEE Transactions on Industrial Informatics.

[39]  Shiwen Mao,et al.  Dealing with Limited Backhaul Capacity in Millimeter-Wave Systems: A Deep Reinforcement Learning Approach , 2018, IEEE Communications Magazine.

[40]  Haibo He,et al.  Distributive Dynamic Spectrum Access Through Deep Reinforcement Learning: A Reservoir Computing-Based Approach , 2018, IEEE Internet of Things Journal.