Predicting movement of objects while the action of learning agent interacts with the dynamics of the scene still remains a key challenge in robotics. We propose a multi-layer Long Short Term Memory (LSTM) autoendocer network that predicts future frames for a robot navigating in a dynamic environment with moving obstacles. The autoencoder network is composed of a state and action conditioned decoder network that reconstructs the future frames of video, conditioned on the action taken by the agent. The input image frames are first transformed into low dimensional feature vectors with a pre-trained encoder network and then reconstructed with the LSTM autoencoder network to generate the future frames. A virtual environment, based on the OpenAi-Gym framework for robotics, is used to gather training data and test the proposed network. The initial experiments show promising results indicating that these predicted frames can be used by an appropriate reinforcement learning framework in future to navigate around dynamic obstacles.
[1]
Andrew Zisserman,et al.
Very Deep Convolutional Networks for Large-Scale Image Recognition
,
2014,
ICLR.
[2]
Takeshi Nishida,et al.
Robot Path Planning by LSTM Network Under Changing Environment
,
2018,
Advances in Intelligent Systems and Computing.
[3]
Dit-Yan Yeung,et al.
Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting
,
2015,
NIPS.
[4]
Alex Graves,et al.
Generating Sequences With Recurrent Neural Networks
,
2013,
ArXiv.
[5]
Sergey Levine,et al.
Unsupervised Learning for Physical Interaction through Video Prediction
,
2016,
NIPS.
[6]
Quoc V. Le,et al.
Sequence to Sequence Learning with Neural Networks
,
2014,
NIPS.
[7]
Alejandro Hernández Cordero,et al.
Extending the OpenAI Gym for robotics: a toolkit for reinforcement learning using ROS and Gazebo
,
2016,
ArXiv.
[8]
Nitish Srivastava,et al.
Unsupervised Learning of Video Representations using LSTMs
,
2015,
ICML.