Deep Q-Learning With Q-Matrix Transfer Learning for Novel Fire Evacuation Environment

We focus on the important problem of emergency evacuation, which clearly could benefit from reinforcement learning that has been largely unaddressed. Emergency evacuation is a complex task which is difficult to solve with reinforcement learning, since an emergency situation is highly dynamic, with a lot of changing variables and complex constraints that makes it difficult to train on. In this paper, we propose the first fire evacuation environment to train reinforcement learning agents for evacuation planning. The environment is modelled as a graph capturing the building structure. It consists of realistic features like fire spread, uncertainty and bottlenecks. We have implemented the environment in the OpenAI gym format, to facilitate future research. We also propose a new reinforcement learning approach that entails pretraining the network weights of a DQN based agents to incorporate information on the shortest path to the exit. We achieved this by using tabular Q-learning to learn the shortest path on the building model's graph. This information is transferred to the network by deliberately overfitting it on the Q-matrix. Then, the pretrained DQN model is trained on the fire evacuation environment to generate the optimal evacuation path under time varying conditions. We perform comparisons of the proposed approach with state-of-the-art reinforcement learning algorithms like PPO, VPG, SARSA, A2C and ACKTR. The results show that our method is able to outperform state-of-the-art models by a huge margin including the original DQN based models. Finally, we test our model on a large and complex real building consisting of 91 rooms, with the possibility to move to any other room, hence giving 8281 actions. We use an attention based mechanism to deal with large action spaces. Our model achieves near optimal performance on the real world emergency environment.

[1]  Stephan Winter,et al.  A Time-Aware Routing Map for Indoor Evacuation , 2016, Sensors.

[2]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[3]  Lijun Wu,et al.  Achieving Human Parity on Automatic Chinese to English News Translation , 2018, ArXiv.

[4]  Leonidas J. Guibas,et al.  Taskonomy: Disentangling Task Transfer Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[5]  Tara N. Sainath,et al.  State-of-the-Art Speech Recognition with Sequence-to-Sequence Models , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[6]  Tom Schaul,et al.  StarCraft II: A New Challenge for Reinforcement Learning , 2017, ArXiv.

[7]  Sebastian Ruder,et al.  Universal Language Model Fine-tuning for Text Classification , 2018, ACL.

[8]  Marwan Mattar,et al.  Unity: A General Platform for Intelligent Agents , 2018, ArXiv.

[9]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[10]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[11]  François Chollet,et al.  Xception: Deep Learning with Depthwise Separable Convolutions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Navdeep Jaitly,et al.  Towards End-To-End Speech Recognition with Recurrent Neural Networks , 2014, ICML.

[13]  Tom Schaul,et al.  Learning from Demonstrations for Real World Reinforcement Learning , 2017, ArXiv.

[14]  S. Wolfram Statistical mechanics of cellular automata , 1983 .

[15]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.

[16]  Wojciech Zaremba,et al.  OpenAI Gym , 2016, ArXiv.

[17]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[18]  Xiang Zhang,et al.  Character-level Convolutional Networks for Text Classification , 2015, NIPS.

[19]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[20]  Tie-Yan Liu,et al.  Target Transfer Q-Learning and Its Convergence Analysis , 2018, Neurocomputing.

[21]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[22]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[23]  Leemon C. Baird,et al.  Residual Algorithms: Reinforcement Learning with Function Approximation , 1995, ICML.

[24]  Kunihiko Fukushima,et al.  Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position , 1980, Biological Cybernetics.

[25]  Derong Liu,et al.  Neural network-based model reference adaptive control system , 2000, IEEE Trans. Syst. Man Cybern. Part B.

[26]  Tom Schaul,et al.  Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.

[27]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[28]  Mark Crowley,et al.  Using Spatial Reinforcement Learning to Build Forest Wildfire Dynamics Models From Satellite Images , 2018, Front. ICT.

[29]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[30]  Quoc V. Le,et al.  Listen, attend and spell: A neural network for large vocabulary conversational speech recognition , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[31]  Norman I. Badler,et al.  Evaluating and Optimizing Evacuation Plans for Crowd Egress , 2017, IEEE Computer Graphics and Applications.

[32]  Mahesan Niranjan,et al.  On-line Q-learning using connectionist systems , 1994 .

[33]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[34]  Ashley Wharton,et al.  Simulation and Investigation of Multi-Agent Reinforcement Learning for Building Evacuation , 2009 .

[35]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[36]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[37]  Dong Yu,et al.  Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[38]  Michael I. Jordan,et al.  MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .

[39]  John N. Tsitsiklis,et al.  Asynchronous stochastic approximation and Q-learning , 1993, Proceedings of 32nd IEEE Conference on Decision and Control.

[40]  Sergey Levine,et al.  Learning to Run challenge: Synthesizing physiologically accurate motion using deep reinforcement learning , 2018, ArXiv.

[41]  Seungho Lee,et al.  An integrated human decision making model for evacuation scenarios under a BDI framework , 2010, TOMC.

[42]  Long-Ji Lin,et al.  Reinforcement learning for robots using neural networks , 1992 .

[43]  Yoshua Bengio,et al.  Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[44]  Seungho Lee,et al.  DYNAMIC LEARNING IN HUMAN DECISION BEHAVIOR FOR EVACUATION SCENARIOS UNDER BDI FRAMEWORK , 2009 .

[45]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[46]  Tomohisa Hayakawa,et al.  Forest fire modeling using cellular automata and percolation threshold analysis , 2011, Proceedings of the 2011 American Control Conference.

[47]  Elman Mansimov,et al.  Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation , 2017, NIPS.

[48]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[49]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Csaba Szepesvári,et al.  The Asymptotic Convergence-Rate of Q-learning , 1997, NIPS.

[51]  Geoffrey E. Hinton,et al.  Large scale distributed neural network training through online distillation , 2018, ICLR.

[52]  Martin A. Riedmiller,et al.  Deep auto-encoder neural networks in reinforcement learning , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[53]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[54]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[55]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[56]  Thomas Hofmann,et al.  Zero-Shot Dual Machine Translation , 2018, ArXiv.

[57]  Tara N. Sainath,et al.  Deep convolutional neural networks for LVCSR , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[58]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[59]  Daniel Moura,et al.  Fighting fire with agents: an agent coordination model for simulated firefighting , 2007, SpringSim '07.

[60]  Ankur Bapna,et al.  The Best of Both Worlds: Combining Recent Advances in Neural Machine Translation , 2018, ACL.

[61]  Hado van Hasselt,et al.  Double Q-learning , 2010, NIPS.

[62]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[63]  Roger B. Grosse,et al.  Optimizing Neural Networks with Kronecker-factored Approximate Curvature , 2015, ICML.

[64]  Luís Paulo Reis,et al.  A Reinforcement Learning Based Method for Optimizing the Process of Decision Making in Fire Brigade Agents , 2011, EPIA.

[65]  H. Robbins A Stochastic Approximation Method , 1951 .

[66]  Imad H. Elhajj,et al.  Artificial intelligence for forest fire prediction , 2010, 2010 IEEE/ASME International Conference on Advanced Intelligent Mechatronics.

[67]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[68]  Kincho H. Law,et al.  Modeling social behaviors in an evacuation simulator , 2014, Comput. Animat. Virtual Worlds.

[69]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[70]  Richard Evans,et al.  Deep Reinforcement Learning in Large Discrete Action Spaces , 2015, 1512.07679.

[71]  Emil Larsson,et al.  Evaluation of Pretraining Methods for Deep Reinforcement Learning , 2018 .

[72]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[73]  Marco Dorigo,et al.  Learning to Control Forest Fires , 1998 .

[74]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..