Neural Network Based Reinforcement Learning Acceleration on FPGA Platforms

Deep Q-learning (DQN) is a recently proposed reinforcement learning algorithm where a neural network is applied as a non-linear approximator to its value function. The exploitation-exploration mechanism allows the training and prediction of the NN to execute simultaneously in an agent during its interaction with the environment. Agents often act independently on battery power, so the training and prediction must occur within the agent and on a limited power budget. In this work, We propose an FPGA acceleration system design for Neural Network Q-learning (NNQL). Our proposed system has high flexibility due to the support to run-time network parameterization, which allows neuroevolution algorithms to dynamically restructure the network to achieve better learning results. Additionally, the power consumption of our proposed system is adaptive to the network size because of a new processing element design. Based on our test cases on networks with hidden layer size ranging from 32 to 16384, our proposed system achieves 7x to 346x speedup compared to GPU implementation and 22x to 77x speedup to hand-coded CPU counterpart.

[1]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[2]  Wayne Luk,et al.  FPGA-Optimised Uniform Random Number Generators Using LUTs and Shift Registers , 2010, 2010 International Conference on Field Programmable Logic and Applications.

[3]  Kunle Olukotun,et al.  A highly scalable Restricted Boltzmann Machine FPGA implementation , 2009, 2009 International Conference on Field Programmable Logic and Applications.

[4]  Razvan Pascanu,et al.  Theano: new features and speed improvements , 2012, ArXiv.

[5]  Risto Miikkulainen,et al.  Evolving adaptive neural networks with and without adaptive synapses , 2003, The 2003 Congress on Evolutionary Computation, 2003. CEC '03..

[6]  David B. Thomas,et al.  Increasing Network Size and Training Throughput of FPGA Restricted Boltzmann Machines Using Dropout , 2016, 2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).

[7]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.