Deep Reinforcement Learning With Quantum-Inspired Experience Replay

In this paper, a novel training paradigm inspired by quantum computation is proposed for deep reinforcement learning (DRL) with experience replay. In contrast to traditional experience replay mechanism in DRL, the proposed deep reinforcement learning with quantum-inspired experience replay (DRL-QER) adaptively chooses experiences from the replay buffer according to the complexity and the replayed times of each experience (also called transition), to achieve a balance between exploration and exploitation. In DRL-QER, transitions are first formulated in quantum representations, and then the preparation operation and the depreciation operation are performed on the transitions. In this progress, the preparation operation reflects the relationship between the temporal difference errors (TD-errors) and the importance of the experiences, while the depreciation operation is taken into account to ensure the diversity of the transitions. The experimental results on Atari 2600 games show that DRL-QER outperforms state-of-the-art algorithms such as DRL-PER and DCRL on most of these games with improved training efficiency, and is also applicable to such memory-based DRL approaches as double network and dueling network.

[1]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[2]  Daoyi Dong,et al.  Self-Paced Prioritized Curriculum Learning With Coverage Penalty in Deep Reinforcement Learning , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[3]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[4]  Michael L. Littman,et al.  Reinforcement learning improves behaviour from evaluative feedback , 2015, Nature.

[5]  Weiping Ding,et al.  Deep Neuro-Cognitive Co-Evolution for Fuzzy Attribute Reduction by Quantum Leaping PSO With Nearest-Neighbor Memeplexes , 2019, IEEE Transactions on Cybernetics.

[6]  Matthieu Cord,et al.  Learning Deep Hierarchical Visual Feature Coding , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[7]  Richard Socher,et al.  Competitive Experience Replay , 2019, ICLR.

[8]  Karl Tuyls,et al.  The importance of experience replay database composition in deep reinforcement learning , 2015 .

[9]  Petros Koumoutsakos,et al.  Remember and Forget for Experience Replay , 2018, ICML.

[10]  Tom Schaul,et al.  Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.

[11]  Qichao Zhang,et al.  Data-Based Reinforcement Learning for Nonzero-Sum Games With Unknown Drift Dynamics , 2019, IEEE Transactions on Cybernetics.

[12]  Tobias J. Osborne,et al.  Training deep quantum neural networks , 2020, Nature Communications.

[13]  Anmer Daskin Quantum Principal Component Analysis , 2015 .

[14]  Matthias Troyer,et al.  Solving the quantum many-body problem with artificial neural networks , 2016, Science.

[15]  Ian R. Petersen,et al.  Quantum control theory and applications: A survey , 2009, IET Control Theory & Applications.

[16]  Vedran Dunjko,et al.  Quantum speedup for active learning agents , 2014, 1401.4997.

[17]  Junichiro Yoshimoto,et al.  Control of exploitation-exploration meta-parameter in reinforcement learning , 2002, Neural Networks.

[18]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[19]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[20]  Peter W. Shor,et al.  Algorithms for quantum computation: discrete logarithms and factoring , 1994, Proceedings 35th Annual Symposium on Foundations of Computer Science.

[21]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[22]  Camille Couprie,et al.  Learning Hierarchical Features for Scene Labeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Pieter Abbeel,et al.  Exploration and apprenticeship learning in reinforcement learning , 2005, ICML.

[24]  J. C. Retamal,et al.  Multiqubit and multilevel quantum reinforcement learning with quantum technologies , 2017, PloS one.

[25]  Anna Levit,et al.  Reinforcement learning using quantum Boltzmann machines , 2016, Quantum Inf. Comput..

[26]  Lucas Lamata,et al.  Basic protocols in quantum reinforcement learning with superconducting circuits , 2017, Scientific Reports.

[27]  Edwin R. Hancock,et al.  A Quantum-Inspired Similarity Measure for the Analysis of Complete Weighted Graphs , 2019, IEEE Transactions on Cybernetics.

[28]  C-Y Lu,et al.  Entanglement-based machine learning on a quantum computer. , 2015, Physical review letters.

[29]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[30]  Tommaso Mannucci,et al.  Safe Exploration Algorithms for Reinforcement Learning Controllers , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[31]  Tzyh Jong Tarn,et al.  Fidelity-Based Probabilistic Q-Learning for Control of Quantum Systems , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[32]  Peiquan Sun,et al.  Attentive Experience Replay , 2020, AAAI.

[33]  Long Ji Lin,et al.  Self-improving reactive agents based on reinforcement learning, planning and teaching , 1992, Machine Learning.

[34]  Hans-J. Briegel,et al.  Quantum-enhanced machine learning , 2016, Physical review letters.

[35]  Daoyi Dong,et al.  Robust Quantum-Inspired Reinforcement Learning for Robot Navigation , 2012, IEEE/ASME Transactions on Mechatronics.

[36]  Thierry Paul,et al.  Quantum computation and quantum information , 2007, Mathematical Structures in Computer Science.

[37]  Hamidou Tembine,et al.  Deep Learning Meets Game Theory: Bregman-Based Algorithms for Interactive Deep Generative Adversarial Networks , 2020, IEEE Transactions on Cybernetics.

[38]  Jiangfeng Du,et al.  Experimental realization of a quantum support vector machine. , 2015, Physical review letters.

[39]  Tom Schaul,et al.  Prioritized Experience Replay , 2015, ICLR.

[40]  Leemon C. Baird,et al.  Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.

[41]  Dongbin Zhao,et al.  MEC—A Near-Optimal Online Reinforcement Learning Algorithm for Continuous Deterministic Systems , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[42]  Lov K. Grover Quantum computers can search arbitrarily large databases by a single query , 1997 .

[43]  Subhash C. Kak,et al.  On Quantum Neural Computing , 1995, Inf. Sci..

[44]  Dapeng Oliver Wu,et al.  Why Deep Learning Works: A Manifold Disentanglement Perspective , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[45]  Hans-J. Briegel,et al.  Machine learning \& artificial intelligence in the quantum domain , 2017, ArXiv.

[46]  Andrew McCallum,et al.  Active Bias: Training More Accurate Neural Networks by Emphasizing High Variance Samples , 2017, NIPS.

[47]  Wei Hu,et al.  Training a Quantum Neural Network to Solve the Contextual Multi-Armed Bandit Problem , 2019, Natural Science.

[48]  D. Dong,et al.  Quantum reinforcement learning during human decision-making , 2020, Nature Human Behaviour.

[49]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.

[50]  Naresh Malla,et al.  Prioritizing Useful Experience Replay for Heuristic Dynamic Programming-Based Learning Systems , 2019, IEEE Transactions on Cybernetics.

[51]  Wojciech Zaremba,et al.  OpenAI Gym , 2016, ArXiv.

[52]  Ievgeniia Oshurko Quantum Machine Learning , 2020, Quantum Computing.

[53]  Derong Liu,et al.  Adaptive $Q$ -Learning for Data-Based Optimal Output Regulation With Experience Replay , 2018, IEEE Transactions on Cybernetics.

[54]  Sergey Levine,et al.  Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.