Deep Reinforcement Learning: Framework, Applications, and Embedded Implementations

The recent breakthroughs of deep reinforcement learning (DRL) technique in Alpha Go and playing Atari have set a good example in handling large state and actions spaces of complicated control problems. The DRL technique is comprised of (i) an offline deep neural network (DNN) construction phase, which derives the correlation between each state-action pair of the system and its value function, and (ii) an online deep Q-learning phase, which adaptively derives the optimal action and updates value estimates. In this paper, we first present the general DRL framework, which can be widely utilized in many applications with different optimization objectives. This is followed by the introduction of three specific applications: the cloud computing resource allocation problem, the residential smart grid task scheduling problem, and building HVAC system optimal control problem. The effectiveness of the DRL technique in these three cyber-physical applications have been validated. Finally, this paper investigates the stochastic computing-based hardware implementations of the DRL framework, which consumes a significant improvement in area efficiency and power consumption compared with binary-based implementation counterparts.

[1]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[2]  Wolf-Dietrich Weber,et al.  Power provisioning for a warehouse-sized computer , 2007, ISCA '07.

[3]  John P. Hayes,et al.  Survey of Stochastic Computing , 2013, TECS.

[4]  Ji Li,et al.  Negotiation-based task scheduling to minimize user’s electricity bills under dynamic energy prices , 2014, 2014 IEEE Online Conference on Green Communications (OnlineGreenComm).

[5]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[6]  Le Yi Wang,et al.  VCONF: a reinforcement learning approach to virtual machines auto-configuration , 2009, ICAC '09.

[7]  Zhongfeng Wang,et al.  Design space exploration for hardware-efficient stochastic computing: A case study on discrete cosine transformation , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[8]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[9]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[10]  Naoya Onizawa,et al.  VLSI Implementation of Deep Neural Network Using Integral Stochastic Computing , 2017, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[11]  Michael O. Duff,et al.  Reinforcement Learning Methods for Continuous-Time Markov Decision Problems , 1994, NIPS.

[12]  Miao Hu,et al.  ISAAC: A Convolutional Neural Network Accelerator with In-Situ Analog Arithmetic in Crossbars , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[13]  Brian R. Gaines,et al.  Stochastic Computing Systems , 1969 .

[14]  Howard C. Card,et al.  Stochastic Neural Computation I: Computational Elements , 2001, IEEE Trans. Computers.

[15]  Hado van Hasselt,et al.  Double Q-learning , 2010, NIPS.

[16]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[17]  Kiyoung Choi,et al.  Approximate de-randomizer for stochastic circuits , 2015, 2015 International SoC Design Conference (ISOCC).

[18]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[19]  Brian R. Gaines,et al.  Stochastic computing , 1967, AFIPS '67 (Spring).

[20]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[21]  Kiyoung Choi,et al.  Dynamic energy-accuracy trade-off using stochastic computing in deep neural networks , 2016, 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC).

[22]  Qinru Qiu,et al.  SC-DCNN: Highly-Scalable Deep Convolutional Neural Network using Stochastic Computing , 2016, ASPLOS.

[23]  Tianshu Wei,et al.  Deep reinforcement learning for building HVAC control , 2017, 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC).

[24]  Daniel Urieli,et al.  A learning agent for heat-pump thermostat control , 2013, AAMAS.

[25]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.