论文信息 - Review, Analyze, and Design a Comprehensive Deep Reinforcement Learning Framework

Review, Analyze, and Design a Comprehensive Deep Reinforcement Learning Framework

Reinforcement learning (RL) has emerged as a standard approach for building an intelligent system, which involves multiple self-operated agents to collectively accomplish a designated task. More importantly, there has been a great attention to RL since the introduction of deep learning that essentially makes RL feasible to operate in high-dimensional environments. However, current research interests are diverted into different directions, such as multi-agent and multi-objective learning, and human-machine interactions. Therefore, in this paper, we propose a comprehensive software architecture that not only plays a vital role in designing a connect-the-dots deep RL architecture but also provides a guideline to develop a realistic RL application in a short time span. By inheriting the proposed architecture, software managers can foresee any challenges when designing a deep RL-based system. As a result, they can expedite the design process and actively control every stage of software development, which is especially critical in agile development environments. For this reason, we designed a deep RL-based framework that strictly ensures flexibility, robustness, and scalability. Finally, to enforce generalization, the proposed architecture does not depend on a specific RL algorithm, a network configuration, the number of agents, or the type of agents.

[1] Samy Bengio,et al. Neural Combinatorial Optimization with Reinforcement Learning , 2016, ICLR.

[2] T. Urbanik,et al. Reinforcement learning-based multi-agent system for network traffic signal control , 2010 .

[3] Razvan Pascanu,et al. Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[4] Tom Schaul,et al. Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.

[5] Kristin Decker,et al. Uml Distilled A Brief Guide To The Standard Object Modeling Language , 2016 .

[6] Peter Stone,et al. Deep Recurrent Q-Learning for Partially Observable MDPs , 2015, AAAI Fall Symposia.

[7] Razvan Pascanu,et al. Relational recurrent neural networks , 2018, NeurIPS.

[8] Marc G. Bellemare,et al. The Reactor: A fast and sample-efficient Actor-Critic agent for Reinforcement Learning , 2017, ICLR.

[9] Stefano Ermon,et al. Generative Adversarial Imitation Learning , 2016, NIPS.

[10] Marc G. Bellemare,et al. Dopamine: A Research Framework for Deep Reinforcement Learning , 2018, ArXiv.

[11] Simon M. Lucas,et al. A Survey of Monte Carlo Tree Search Methods , 2012, IEEE Transactions on Computational Intelligence and AI in Games.

[12] Sriraam Natarajan,et al. Dynamic preferences in multi-criteria reinforcement learning , 2005, ICML.

[13] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.

[14] Nando de Freitas,et al. Sample Efficient Actor-Critic with Experience Replay , 2016, ICLR.

[15] Richard E. Turner,et al. Interpolated Policy Gradient: Merging On-Policy and Off-Policy Gradient Estimation for Deep Reinforcement Learning , 2017, NIPS.

[16] Razvan Pascanu,et al. Policy Distillation , 2015, ICLR.

[17] Dorian Kodelja,et al. Multiagent cooperation and competition with deep reinforcement learning , 2015, PloS one.

[18] Wolfram Burgard,et al. A Probabilistic Approach to Collaborative Multi-Robot Localization , 2000, Auton. Robots.

[19] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.

[20] Bikramjit Banerjee,et al. Multi-agent reinforcement learning as a rehearsal for decentralized planning , 2016, Neurocomputing.

[21] Tom Schaul,et al. Prioritized Experience Replay , 2015, ICLR.

[22] Maxim Egorov Stanford. MULTI-AGENT DEEP REINFORCEMENT LEARNING , 2016 .

[23] Gerald Tesauro,et al. Temporal Difference Learning and TD-Gammon , 1995, J. Int. Comput. Games Assoc..

[24] Jordan L. Boyd-Graber,et al. Opponent Modeling in Deep Reinforcement Learning , 2016, ICML.

[25] Shimon Whiteson,et al. A Survey of Multi-Objective Sequential Decision-Making , 2013, J. Artif. Intell. Res..

[26] Joshua B. Tenenbaum,et al. Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation , 2016, NIPS.

[27] Weinan Zhang,et al. Real-Time Bidding with Multi-Agent Reinforcement Learning in Display Advertising , 2018, CIKM.

[28] Saeid Nahavandi,et al. Multi-agent behavioral control system using deep reinforcement learning , 2019, Neurocomputing.

[29] Srini Narayanan,et al. Learning all optimal policies with multiple criteria , 2008, ICML '08.

[30] Andrew G. Barto,et al. Elevator Group Control Using Multiple Reinforcement Learning Agents , 1998, Machine Learning.

[31] Gerald Tesauro,et al. TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.

[32] Cecilia Laschi,et al. Model-Based Reinforcement Learning for Closed-Loop Dynamic Control of Soft Robotic Manipulators , 2019, IEEE Transactions on Robotics.

[33] Michael I. Jordan,et al. RLlib: Abstractions for Distributed Reinforcement Learning , 2017, ICML.

[34] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[35] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.

[36] Dale Schuurmans,et al. Bridging the Gap Between Value and Policy Based Reinforcement Learning , 2017, NIPS.

[37] Yoshua Bengio,et al. An Empirical Investigation of Catastrophic Forgeting in Gradient-Based Neural Networks , 2013, ICLR.

[38] Marc G. Bellemare,et al. Count-Based Exploration with Neural Density Models , 2017, ICML.

[39] Jason Weston,et al. Curriculum learning , 2009, ICML '09.

[40] Tomaso A. Poggio,et al. Regularization Theory and Neural Networks Architectures , 1995, Neural Computation.

[41] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.

[42] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.

[43] C A Nelson,et al. Learning to Learn , 2017, Encyclopedia of Machine Learning and Data Mining.

[44] Laurence Tianruo Yang,et al. Fuzzy Logic with Engineering Applications , 1999 .

[45] Nicholas Jing Yuan,et al. DRN: A Deep Reinforcement Learning Framework for News Recommendation , 2018, WWW.

[46] Mikhail Pavlov,et al. Deep Attention Recurrent Q-Network , 2015, ArXiv.

[47] Romain Laroche,et al. Hybrid Reward Architecture for Reinforcement Learning , 2017, NIPS.

[48] Lawrence V. Snyder,et al. Reinforcement Learning for Solving the Vehicle Routing Problem , 2018, NeurIPS.

[49] Xiaofeng Wang,et al. Reinforcement Learning to Play an Optimal Nash Equilibrium in Team Markov Games , 2002, NIPS.

[50] Marcin Andrychowicz,et al. Hindsight Experience Replay , 2017, NIPS.

[51] Elman Mansimov,et al. Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation , 2017, NIPS.

[52] Daniel Kudenko,et al. Online learning of shaping rewards in reinforcement learning , 2010, Neural Networks.

[53] Oliver Kroemer,et al. Learning to select and generalize striking movements in robot table tennis , 2012, AAAI Fall Symposium: Robots Learning Interactively from Human Teachers.

[54] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[55] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[56] Chrisantha Fernando,et al. PathNet: Evolution Channels Gradient Descent in Super Neural Networks , 2017, ArXiv.

[57] Koray Kavukcuoglu,et al. PGQ: Combining policy gradient and Q-learning , 2016, ArXiv.

[58] Stefan Schaal,et al. Natural Actor-Critic , 2003, Neurocomputing.

[59] Sean Luke,et al. Cooperative Multi-Agent Learning: The State of the Art , 2005, Autonomous Agents and Multi-Agent Systems.

[60] Kenta Oono,et al. Chainer : a Next-Generation Open Source Framework for Deep Learning , 2015 .

[61] Mykel J. Kochenderfer,et al. Cooperative Multi-agent Control Using Deep Reinforcement Learning , 2017, AAMAS Workshops.

[62] Saeid Nahavandi,et al. System Design Perspective for Human-Level Agents Using Deep Reinforcement Learning: A Survey , 2017, IEEE Access.

[63] Rahul Savani,et al. Lenient Multi-Agent Deep Reinforcement Learning , 2017, AAMAS.

[64] Xi Chen,et al. Evolution Strategies as a Scalable Alternative to Reinforcement Learning , 2017, ArXiv.

[65] Joseph A. Paradiso,et al. The gesture recognition toolkit , 2014, J. Mach. Learn. Res..

[66] Shimon Whiteson,et al. Multi-Objective Deep Reinforcement Learning , 2016, ArXiv.

[67] Yoram Singer,et al. Efficient Online and Batch Learning Using Forward Backward Splitting , 2009, J. Mach. Learn. Res..

[68] Sinno Jialin Pan,et al. Knowledge Transfer for Deep Reinforcement Learning with Hierarchical Experience Replay , 2017, AAAI.

[69] Pieter Abbeel,et al. Accelerated Methods for Deep Reinforcement Learning , 2018, ArXiv.

[70] Sridhar Mahadevan,et al. Recent Advances in Hierarchical Reinforcement Learning , 2003, Discret. Event Dyn. Syst..

[71] Marc G. Bellemare,et al. The Arcade Learning Environment: An Evaluation Platform for General Agents , 2012, J. Artif. Intell. Res..

[72] Ann Nowé,et al. Multi-objective reinforcement learning using sets of pareto dominating policies , 2014, J. Mach. Learn. Res..

[73] Shane Legg,et al. Deep Reinforcement Learning from Human Preferences , 2017, NIPS.

[74] John Schulman,et al. Concrete Problems in AI Safety , 2016, ArXiv.

[75] Vijay Janapa Reddi,et al. Deep Reinforcement Learning for Cyber Security , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[76] Amnon Shashua,et al. Safe, Multi-Agent, Reinforcement Learning for Autonomous Driving , 2016, ArXiv.

[77] Alexei A. Efros,et al. Large-Scale Study of Curiosity-Driven Learning , 2018, ICLR.

[78] Yi Wu,et al. Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments , 2017, NIPS.

[79] Tom Schaul,et al. Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.

[80] Razvan Pascanu,et al. Progressive Neural Networks , 2016, ArXiv.

[81] Martin A. Riedmiller,et al. Reinforcement learning for robot soccer , 2009, Auton. Robots.

[82] Thanh Thi Nguyen,et al. A Multi-Objective Deep Reinforcement Learning Framework , 2018, Eng. Appl. Artif. Intell..

[83] Michael H. Bowling,et al. Bayes' Bluff: Opponent Modelling in Poker , 2005, UAI 2005.

[84] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[85] Evan Dekker,et al. Empirical evaluation methods for multiobjective reinforcement learning algorithms , 2011, Machine Learning.

[86] Rob Fergus,et al. Learning Multiagent Communication with Backpropagation , 2016, NIPS.

[87] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.

[88] Ion Stoica,et al. Ray RLLib: A Composable and Scalable Reinforcement Learning Library , 2017, NIPS 2017.

[89] Robert Babuska,et al. Experience Replay for Real-Time Reinforcement Learning Control , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[90] Brian Ellis,et al. The Factory Pattern in API Design: A Usability Evaluation , 2007, 29th International Conference on Software Engineering (ICSE'07).

[91] Pieter Abbeel,et al. Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.

[92] Ruslan Salakhutdinov,et al. Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning , 2015, ICLR.

[93] Saeid Nahavandi,et al. Deep Reinforcement Learning for Multiagent Systems: A Review of Challenges, Solutions, and Applications , 2018, IEEE Transactions on Cybernetics.

[94] David Silver,et al. Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[95] Etienne Perot,et al. Deep Reinforcement Learning framework for Autonomous Driving , 2017, Autonomous Vehicles and Machines.

[96] Wojciech Zaremba,et al. OpenAI Gym , 2016, ArXiv.

[97] Bart De Schutter,et al. A Comprehensive Survey of Multiagent Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[98] Pieter Abbeel,et al. Mutual Alignment Transfer Learning , 2017, CoRL.

[99] Peter Harremoës,et al. Rényi Divergence and Kullback-Leibler Divergence , 2012, IEEE Transactions on Information Theory.

[100] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[101] Eduardo F. Morales,et al. An Introduction to Reinforcement Learning , 2011 .

[102] David Budden,et al. Distributed Prioritized Experience Replay , 2018, ICLR.

[103] Gerhard Weiss,et al. Multiagent Learning: Basics, Challenges, and Prospects , 2012, AI Mag..

[104] Koray Kavukcuoglu,et al. Combining policy gradient and Q-learning , 2016, ICLR.

[105] Tom Schaul,et al. Reinforcement Learning with Unsupervised Auxiliary Tasks , 2016, ICLR.

[106] Murray Campbell,et al. Deep Blue , 2002, Artif. Intell..

[107] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[108] Ruslan Salakhutdinov,et al. Neural Map: Structured Memory for Deep Reinforcement Learning , 2017, ICLR.

[109] Pieter Abbeel,et al. Equivalence Between Policy Gradients and Soft Q-Learning , 2017, ArXiv.

[110] Shimon Whiteson,et al. Learning to Communicate with Deep Multi-Agent Reinforcement Learning , 2016, NIPS.

[111] Srikanth Kandula,et al. Resource Management with Deep Reinforcement Learning , 2016, HotNets.

[112] Sean R Eddy,et al. What is dynamic programming? , 2004, Nature Biotechnology.

[113] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[114] Risto Miikkulainen,et al. A Neuroevolution Approach to General Atari Game Playing , 2014, IEEE Transactions on Computational Intelligence and AI in Games.

[115] Gerald Tesauro,et al. On-line Policy Improvement using Monte-Carlo Search , 1996, NIPS.

[116] Joel Z. Leibo,et al. Multi-agent Reinforcement Learning in Sequential Social Dilemmas , 2017, AAMAS.

[117] Ben Tse,et al. Autonomous Inverted Helicopter Flight via Reinforcement Learning , 2004, ISER.

[118] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[119] Alexei A. Efros,et al. Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).