论文信息 - Deep Learning Approximation for Stochastic Control Problems - 字舞流文

Deep Learning Approximation for Stochastic Control Problems

Many real world stochastic control problems suffer from the "curse of dimensionality". To overcome this difficulty, we develop a deep learning approach that directly solves high-dimensional stochastic control problems based on Monte-Carlo sampling. We approximate the time-dependent controls as feedforward neural networks and stack these networks together through model dynamics. The objective function for the control problem plays the role of the loss function for the deep neural network. We test this approach using examples from the areas of optimal trading and energy storage. Our results suggest that the algorithm presented here achieves satisfactory accuracy and at the same time, can handle rather high dimensional problems.

E Weinan | Jiequn Han | W. E | E. Weinan | Jiequn Han | E. Weinan

[1] Warren B. Powell,et al. Approximate Dynamic Programming - Solving the Curses of Dimensionality , 2007 .

[2] Sergey Levine,et al. High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.

[3] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[4] Geoffrey E. Hinton,et al. Deep Learning , 2015, Nature.

[5] Yoshua. Bengio,et al. Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[6] D. Bertsimas,et al. Optimal control of execution costs , 1998 .

[7] Yuval Tassa,et al. Continuous control with deep reinforcement learning , 2015, ICLR.

[8] Andrew G. Barto,et al. Reinforcement learning , 1998 .

[9] Daniel F. Salas,et al. Benchmarking a Scalable Approximate Dynamic Programming Algorithm for Stochastic Control of Multidimensional Energy Storage Problems , 2013 .

[10] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[11] Dimitris Bertsimas,et al. Optimal control of execution costs for portfolios , 1999, Comput. Sci. Eng..

[12] Warren B. Powell,et al. An Approximate Dynamic Programming Algorithm for Monotone Value Functions , 2014, Oper. Res..

[13] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[14] Sergey Levine,et al. Trust Region Policy Optimization , 2015, ICML.

[15] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[16] Richard S. Sutton,et al. Dimensions of Reinforcement Learning , 1998 .

[17] Pieter Abbeel,et al. Benchmarking Deep Reinforcement Learning for Continuous Control , 2016, ICML.

[18] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[19] Martín Abadi,et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[20] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.