Reinforcement Learning from Simulated Environments: An Encoder Decoder Framework

Reinforcement learning (RL) is used for sequential decision making such as operating and maintaining manufacturing systems. In RL, the system is modeled as a Markov decision process with states, actions, rewards, and policies. A policy is learned through repeated interaction with the environment. When a RL agent cannot interact with the real system due to time and cost constraints, a simulation of the system may be used in its place. Unfortunately, most simulations are not built for the purpose of interacting with a RL agent. Simulations built to function as the environment are often structured only according to the defined state, action, and reward and can lack fidelity, detail, and accuracy. We propose a general framework for bridging the worlds of simulation and RL. This is accomplished by placing "interpreters" between the simulation and the RL agent that translate information into a form that is coherent to each entity.

[1]  Averill M. Law,et al.  Simulation Modeling and Analysis , 1982 .

[2]  Jeffrey D. Scargle,et al.  An algorithm for optimal partitioning of data on an interval , 2003, IEEE Signal Processing Letters.

[3]  Andrew W. Moore,et al.  The parti-game algorithm for variable resolution reinforcement learning in multidimensional state-spaces , 2004, Machine Learning.

[4]  Sergey Levine,et al.  Model-based reinforcement learning with parametrized physical models and optimism-driven exploration , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[5]  Gaurav S. Sukhatme,et al.  Combining Model-Based and Model-Free Updates for Trajectory-Centric Reinforcement Learning , 2017, ICML.

[6]  Jeffrey S. Smith,et al.  Simulation for manufacturing system design and operation: Literature review and analysis , 2014 .

[7]  Stephen C. Adams,et al.  On the Practical Art of State Definitions for Markov Decision Process Construction , 2018, IEEE Access.

[8]  Garrett Birkhoff,et al.  A survey of modern algebra , 1942 .

[9]  R. Tennant Algebra , 1941, Nature.

[10]  Jian Sun,et al.  Fault-diagnosis for reciprocating compressors using big data and machine learning , 2018, Simul. Model. Pract. Theory.

[11]  Robert G. Sargent Event Graph Modelling for Simulation with an Application to Flexible Manufacturing Systems , 1988 .

[12]  Shimon Whiteson,et al.  A Survey of Multi-Objective Sequential Decision-Making , 2013, J. Artif. Intell. Res..

[13]  Lee W. Schruben,et al.  Simulation modeling with event graphs , 1983, CACM.

[14]  Pedro Ferreira,et al.  An MDP Model-Based Reinforcement Learning Approach for Production Station Ramp-Up Optimization: Q-Learning Analysis , 2014, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[15]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[16]  Lahouari Ghouti,et al.  Mobility prediction in mobile ad hoc networks using neural learning machines , 2016, Simul. Model. Pract. Theory.

[17]  Eduardo F. Morales,et al.  An Introduction to Reinforcement Learning , 2011 .

[18]  Fernando Fernández,et al.  Emergent behaviors and scalability for multi-agent reinforcement learning-based pedestrian models , 2017, Simul. Model. Pract. Theory.

[19]  Jie Wang,et al.  Learning adaptive dispatching rules for a manufacturing process system by using reinforcement learning approach , 2016, 2016 IEEE 21st International Conference on Emerging Technologies and Factory Automation (ETFA).