Sequential frameworks for statistics-based value function representation in approximate dynamic programming

Dynamic programming (DP) was derived by Bellman in 1957 as a mathematical programming method to solve multistage decision problems. With advances in computational power, a new family of dynamic programming known as approximate dynamic programming (ADP) has emerged. Under a statistical perspective, more efficient design of experiments methods, such as orthogonal arrays (OAs) and number theoretic methods (NTMs), combined with flexible statistical modeling methods, such as multivariate adaptive regression splines (MARS) and neural networks (NNs), enabled approximate solutions to higher-dimensional problems. The above statistical perspective still maintains a traditional DP solution approach. By contrast, machine learning approaches evolved in the artificial intelligence community to approximately "learn" the DP solution via an iterative search. These learning based methods fall under various names, including reinforcement learning (RL), adaptive critic, and neuro-dynamic programming. These learning based ADP methods were initiated from the theories of psychology and animal learning, but now have evolved as an important branch-stream of machine learning methods. Compared with the previous ADP methods developed in statistical and operations research communities, this kind of methods can adaptively and gradually learn the DP solution with certain learning algorithms. However, the practical success of RL approaches is still limited due to extremely high computational cost. The RL approach to ADP is sequential in nature, and this dissertation seeks to improve upon the statistical perspective by developing sequential approaches in the spirit of RL. The existing ADP methods assume fixed model structures for approximating the future value (or cost-to-go) function. In practice, this model structure is difficult to identify, in many cases requiring a time-consuming trial-and-error process. In addition, the statistical perspective requires determination of the discretization sample size in advance. The iterative approach of RL, automatically determines sample size and uses system dynamics to explore the state space. In this dissertation, two types of sequential algorithms are developed. The first type uses a sequential concept based on consistency theory to both identify the approximating model structure and determine sample size. The second type uses system dynamics to sequentially identify the state space region. The first type of sequential algorithm builds an adaptive value function approximation while the size of the state space sample grows. In the statistical perspective to ADP, there are two components to value function approximation: (1) design of experiments and (2) statistical modeling. For design of experiments, NTM low-discrepancy sequence sampling techniques are employed because of the sequential nature in which they are generated. For statistical modeling, feed-forward NN models are used because of their consistency ability. The adaptive value function approximation (AVFA) is implemented in each stage of the backward-solving DP framework. Three different AVFA algorithms are derived based on the consistency concept and then tested on a nine-dimensional inventory forecasting problem. The first algorithm increments the size of the state space training data in each sequential step, and for each sample size a successive model search process is performed to find an optimal NN model. The second algorithm improves on the first by reducing the computation of the successive model search process. The third algorithm uses a more natural perspective of the consistency concept, where in each step, either the sample size is incremented or the complexity of the NN model is grown; an optimal NN model is not directly sought in this algorithm, but rather the consistency concept implies that convergence will yield the optimal model. The second type of sequential algorithm conducts state space exploration forwards through the stages. The objective is to identify the appropriate region in the state space to model the future value function. The specification of this region is needed in the design of experiments component of the statistical perspective; however, in practice, this region is typically unknown. Hence, this sequential state space exploration (SSSE) approach fills an important need. Since decisions are needed to move forward through the stages, both random and optimal decisions are explored. The SSSE algorithm is combined with the AVFA algorithm to yield a novel self-organized forward-backward ADP solution framework. This framework consists of two phases. The first phase has a single forward SSSE step using random decisions to identify an initial state space region for each stage and a single backward AVFA step to build initial future value function approximations over these regions. The second phase iterates between a forward SSSE step with optimal decisions and a backward AVFA step to update the state space regions and the future value function approximations until the state space regions stop changing. This new sequential SSSE-AVFA algorithm is also tested on a nine-dimensional stochastic inventory forecasting problem.

[1]  Cristiano Cervellera,et al.  Neural network and regression spline value function approximations for stochastic dynamic programming , 2007, Comput. Oper. Res..

[2]  Warren B. Powell,et al.  GUIDANCE IN THE USE OF ADAPTIVE CRITICS FOR CONTROL , 2007 .

[3]  C. Watkins Learning from delayed rewards , 1989 .

[4]  A. K. Pujari,et al.  Data Mining Techniques , 2006 .

[5]  Leon Cooper,et al.  Introduction to Dynamic Programming , 1981 .

[6]  Warren B. Powell,et al.  Handbook of Learning and Approximate Dynamic Programming , 2006, IEEE Transactions on Automatic Control.

[7]  Cristina H. Amon,et al.  An engineering design methodology with multistage Bayesian surrogates and optimal sampling , 1996 .

[8]  P. Kitanidis,et al.  Gradient dynamic programming for stochastic optimal control of multidimensional water resources systems , 1988 .

[9]  A. Sudjianto,et al.  An Efficient Algorithm for Constructing Optimal Design of Computer Experiments , 2005, DAC 2003.

[10]  Harald Niederreiter,et al.  Random number generation and Quasi-Monte Carlo methods , 1992, CBMS-NSF regional conference series in applied mathematics.

[11]  Andrew R. Barron,et al.  Universal approximation bounds for superpositions of a sigmoidal function , 1993, IEEE Trans. Inf. Theory.

[12]  George L. Nemhauser,et al.  Introduction To Dynamic Programming , 1966 .

[13]  Andrew Kusiak,et al.  Selection and validation of predictive regression and neural network models based on designed experiments , 2006 .

[14]  Yao Lin,et al.  An Efficient Robust Concept Exploration Method and Sequential Exploratory Experimental Design , 2004 .

[15]  Ying Li,et al.  Numerical Solution of Continuous-State Dynamic Programs Using Linear and Spline Interpolation , 1993, Oper. Res..

[16]  M. B. Beck,et al.  Stochastic Dynamic Programming Formulation for a Wastewater Treatment Decision-Making Framework , 2004, Ann. Oper. Res..

[17]  G. Mirchandani,et al.  On hidden nodes for neural nets , 1989 .

[18]  G. Wahba Spline models for observational data , 1990 .

[19]  Jerome H. Friedman Multivariate adaptive regression splines (with discussion) , 1991 .

[20]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[21]  Eric V. Denardo,et al.  Dynamic Programming: Models and Applications , 2003 .

[22]  I. Sobol On the distribution of points in a cube and the approximate evaluation of integrals , 1967 .

[23]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[24]  Jong Min Lee,et al.  Approximate Dynamic Programming Strategies and Their Applicability for Process Control: A Review and Future Directions , 2004 .

[25]  Jennie Si,et al.  Adaptive Critic Based Neural Network for ControlConstrained Agile Missile , 2004 .

[26]  Jack P. C. Kleijnen,et al.  Application-driven sequential designs for simulation experiments: Kriging metamodelling , 2004, J. Oper. Res. Soc..

[27]  Warren B. Powell,et al.  Approximate dynamic programming for high dimensional resource allocation problems , 2005 .

[28]  Russell R. Barton,et al.  Ch. 7. A review of design and modeling in computer experiments , 2003 .

[29]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[30]  Jay M. Rosenberger,et al.  A statistical computer experiments approach to airline fleet assignment , 2008 .

[31]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[32]  R. Bellman Dynamic programming. , 1957, Science.

[33]  C. Currin,et al.  A Bayesian Approach to the Design and Analysis of Computer Experiments , 1988 .

[34]  Elie Bienenstock,et al.  Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.

[35]  Jie Zhang,et al.  A Sequential Learning Approach for Single Hidden Layer Neural Networks , 1998, Neural Networks.

[36]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[37]  P.J. Werbos,et al.  Using ADP to Understand and Replicate Brain Intelligence: the Next Level Design , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.

[38]  Victoria C. P. Chen,et al.  Flexible and Robust Implementations of Multivariate Adaptive Regression Splines Within a Wastewater Treatment Stochastic Dynamic Program , 2005 .

[39]  Donald E. Kirk An Introduction to Dynamic Programming , 1967 .

[40]  J. Hammersley MONTE CARLO METHODS FOR SOLVING MULTIVARIABLE PROBLEMS , 1960 .

[41]  Rudy Setiono,et al.  Feedforward Neural Network Construction Using Cross Validation , 2001, Neural Computation.

[42]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[43]  Nello Cristianini,et al.  An introduction to Support Vector Machines , 2000 .

[44]  Y. Wang,et al.  NUMBER THEORETIC METHODS IN APPLIED STATISTICS (II) , 1990 .

[45]  Richard P. Lippmann,et al.  An introduction to computing with neural nets , 1987 .

[46]  T. J. Mitchell,et al.  Bayesian Prediction of Deterministic Functions, with Applications to the Design and Analysis of Computer Experiments , 1991 .

[47]  H. Faure Discrépance de suites associées à un système de numération (en dimension s) , 1982 .

[48]  Art Lew,et al.  Dynamic Programming: an overview , 2006 .

[49]  Farrokh Mistree,et al.  A Sequential Exploratory Experimental Design Method: Development of Appropriate Empirical Models in Design , 2004, DAC 2004.

[50]  John N. Tsitsiklis,et al.  Regression methods for pricing complex American-style options , 2001, IEEE Trans. Neural Networks.

[51]  Péter András,et al.  The Equivalence of Support Vector Machine and Regularization Neural Networks , 2002, Neural Processing Letters.

[52]  Christine A. Shoemaker,et al.  Applying Experimental Design and Regression Splines to High-Dimensional Continuous-State Stochastic Dynamic Programming , 1999, Oper. Res..

[53]  Farrokh Mistree,et al.  Sequential Metamodeling in Engineering Design , 2004 .

[54]  Derong Liu,et al.  Direct Neural Dynamic Programming , 2004 .

[55]  Jennie Si,et al.  Robust Reinforcement Learning for Heating, Ventilation, and Air Conditioning Control of Buildings , 2004 .

[56]  Thomas J. Santner,et al.  The Design and Analysis of Computer Experiments , 2003, Springer Series in Statistics.

[57]  Jennie Si,et al.  ADP: Goals, Opportunities and Principles , 2004 .

[58]  Halbert White,et al.  Connectionist nonparametric regression: Multilayer feedforward networks can learn arbitrary mappings , 1990, Neural Networks.

[59]  A. Barto,et al.  ModelBased Adaptive Critic Designs , 2004 .

[60]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[61]  Byoung-Tak Zhang,et al.  An incremental learning algorithm that optimizes network size and sample size in one trial , 1994, Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN'94).

[62]  Thomas Uthmann,et al.  Experiments in Value Function Approximation with Sparse Support Vector Regression , 2004, ECML.