Many machine learning systems are built to solve the toughest planning problems. Such systems usually adopt a one-size-fits-all approach to different planning problems. This can lead to a waste of precious computing resources in simple planning problems, while not investing enough in complex problems. This requires a new model framework that does not require the learning of a single, fixed strategy, but rather introduces a series of decision controllers to resolve various planning tasks through learning to build, predict, and evaluate plans. Therefore, we propose a forward-looking imaginative planning framework combined with Prioritized-Replay Double DQN, which is a model-based continuous decision controller that determines the number of iterations of the decision-making process to be run and the model to be negotiated in each iteration. Before any single unit action, it can imagine and select actions based on the current state, including advance imagination with limited steps, and evaluate it with its model-based imagination. All imagined actions or outcomes will be iteratively integrated into a “plan environment”, which can test alternative imagined actions and be able to flexibly use a learned policy in the previously imagined state. Basis on these, the prioritized replay mode is adopted to improve the sampling weight and the training efficiency, which will make the metadata obtain lower overall cost than the traditional fixed strategy method, including task loss and calculation cost.
[1]
J. Togelius,et al.
Discovering Unique Game Variants
,
2015
.
[2]
Alex Graves,et al.
Strategic Attentive Writer for Learning Macro-Actions
,
2016,
NIPS.
[3]
Alex Graves,et al.
Adaptive Computation Time for Recurrent Neural Networks
,
2016,
ArXiv.
[4]
Marcin Andrychowicz,et al.
Learning to learn by gradient descent by gradient descent
,
2016,
NIPS.
[5]
David Tolpin,et al.
Selecting Computations: Theory and Applications
,
2012,
UAI.
[6]
Razvan Pascanu,et al.
Interaction Networks for Learning about Objects, Relations and Physics
,
2016,
NIPS.
[7]
Jürgen Schmidhuber,et al.
Long Short-Term Memory
,
1997,
Neural Computation.
[8]
Shane Legg,et al.
Human-level control through deep reinforcement learning
,
2015,
Nature.
[9]
Pieter Abbeel,et al.
Value Iteration Networks
,
2016,
NIPS.
[10]
Guy Lever,et al.
Deterministic Policy Gradient Algorithms
,
2014,
ICML.
[11]
Jitendra Malik,et al.
Learning Visual Predictive Models of Physics for Playing Billiards
,
2015,
ICLR.
[12]
Yuval Tassa,et al.
Continuous control with deep reinforcement learning
,
2015,
ICLR.
[13]
Yuval Tassa,et al.
Learning Continuous Control Policies by Stochastic Value Gradients
,
2015,
NIPS.