论文信息 - Reinforcement Learning from Simulated Environments: An Encoder Decoder Framework

Reinforcement Learning from Simulated Environments: An Encoder Decoder Framework

Reinforcement learning (RL) is used for sequential decision making such as operating and maintaining manufacturing systems. In RL, the system is modeled as a Markov decision process with states, actions, rewards, and policies. A policy is learned through repeated interaction with the environment. When a RL agent cannot interact with the real system due to time and cost constraints, a simulation of the system may be used in its place. Unfortunately, most simulations are not built for the purpose of interacting with a RL agent. Simulations built to function as the environment are often structured only according to the defined state, action, and reward and can lack fidelity, detail, and accuracy. We propose a general framework for bridging the worlds of simulation and RL. This is accomplished by placing "interpreters" between the simulation and the RL agent that translate information into a form that is coherent to each entity.

[1] Averill M. Law,et al. Simulation Modeling and Analysis , 1982 .

[2] Jeffrey D. Scargle,et al. An algorithm for optimal partitioning of data on an interval , 2003, IEEE Signal Processing Letters.

[3] Andrew W. Moore,et al. The parti-game algorithm for variable resolution reinforcement learning in multidimensional state-spaces , 2004, Machine Learning.

[4] Sergey Levine,et al. Model-based reinforcement learning with parametrized physical models and optimism-driven exploration , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[5] Gaurav S. Sukhatme,et al. Combining Model-Based and Model-Free Updates for Trajectory-Centric Reinforcement Learning , 2017, ICML.

[6] Jeffrey S. Smith,et al. Simulation for manufacturing system design and operation: Literature review and analysis , 2014 .

[7] Stephen C. Adams,et al. On the Practical Art of State Definitions for Markov Decision Process Construction , 2018, IEEE Access.

[8] Garrett Birkhoff,et al. A survey of modern algebra , 1942 .

[9] R. Tennant. Algebra , 1941, Nature.

[10] Jian Sun,et al. Fault-diagnosis for reciprocating compressors using big data and machine learning , 2018, Simul. Model. Pract. Theory.

[11] Robert G. Sargent. Event Graph Modelling for Simulation with an Application to Flexible Manufacturing Systems , 1988 .

[12] Shimon Whiteson,et al. A Survey of Multi-Objective Sequential Decision-Making , 2013, J. Artif. Intell. Res..

[13] Lee W. Schruben,et al. Simulation modeling with event graphs , 1983, CACM.

[14] Pedro Ferreira,et al. An MDP Model-Based Reinforcement Learning Approach for Production Station Ramp-Up Optimization: Q-Learning Analysis , 2014, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[15] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[16] Lahouari Ghouti,et al. Mobility prediction in mobile ad hoc networks using neural learning machines , 2016, Simul. Model. Pract. Theory.

[17] Eduardo F. Morales,et al. An Introduction to Reinforcement Learning , 2011 .

[18] Fernando Fernández,et al. Emergent behaviors and scalability for multi-agent reinforcement learning-based pedestrian models , 2017, Simul. Model. Pract. Theory.

[19] Jie Wang,et al. Learning adaptive dispatching rules for a manufacturing process system by using reinforcement learning approach , 2016, 2016 IEEE 21st International Conference on Emerging Technologies and Factory Automation (ETFA).