Value Function Approximation using Multiple Aggregation for Multiattribute Resource Management

We consider the problem of estimating the value of a multiattribute resource, where the attributes are categorical or discrete in nature and the number of potential attribute vectors is very large. The problem arises in approximate dynamic programming when we need to estimate the value of a multiattribute resource from estimates based on Monte-Carlo simulation. These problems have been traditionally solved using aggregation, but choosing the right level of aggregation requires resolving the classic tradeoff between aggregation error and sampling error. We propose a method that estimates the value of a resource at different levels of aggregation simultaneously, and then uses a weighted combination of the estimates. Using the optimal weights, which minimizes the variance of the estimate while accounting for correlations between the estimates, is computationally too expensive for practical applications. We have found that a simple inverse variance formula (adjusted for bias), which effectively assumes the estimates are independent, produces near-optimal estimates. We use the setting of two levels of aggregation to explain why this approximation works so well.

[1]  Nicola Secomandi,et al.  Comparing neuro-dynamic programming algorithms for the vehicle routing problem with stochastic demands , 2000, Comput. Oper. Res..

[2]  Rein Luus,et al.  Iterative dynamic programming , 2019, Iterative Dynamic Programming.

[3]  Andrew G. Barto,et al.  Reinforcement learning , 1998 .

[4]  Panos M. Pardalos,et al.  Approximate dynamic programming: solving the curses of dimensionality , 2009, Optim. Methods Softw..

[5]  irwin Guttman,et al.  Introductory Engineering Statistics , 1965 .

[6]  Warren B. Powell,et al.  The Dynamic Assignment Problem , 2004, Transp. Sci..

[7]  James R. Evans,et al.  Aggregation and Disaggregation Techniques and Methodology in Optimization , 1991, Oper. Res..

[8]  Decision Systems.,et al.  Intelligent optimal control , 1995 .

[9]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[10]  Craig Boutilier,et al.  Decision-Theoretic Planning: Structural Assumptions and Computational Leverage , 1999, J. Artif. Intell. Res..

[11]  IT Kee-EungKim Solving Factored MDPs Using Non-homogeneous Partitions , 1998 .

[12]  Michel Gendreau,et al.  DYNAMIC VEHICLE ROUTING AND DISPATCHING , 1998 .

[13]  Roy Mendelssohn,et al.  An Iterative Aggregation Procedure for Markov Decision Processes , 1982, Oper. Res..

[14]  M. Kenward,et al.  An Introduction to the Bootstrap , 2007 .

[15]  D. Bertsekas,et al.  Adaptive aggregation methods for infinite horizon dynamic programming , 1989 .

[16]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[17]  Craig Boutilier,et al.  Stochastic dynamic programming with factored representations , 2000, Artif. Intell..

[18]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[19]  Suresh P. Sethi,et al.  Near Optimization of Dynamic Systems by Decomposition and Aggregation , 1998 .

[20]  Warren B. Powell,et al.  Dynamic Control of Logistics Queueing Networks for Large-Scale Fleet Management , 1998, Transp. Sci..

[21]  Irwin Guttman,et al.  Introductory Engineering Statistics , 1965 .

[22]  Leo Breiman,et al.  Stacked regressions , 2004, Machine Learning.

[23]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[24]  Ward Whitt,et al.  Approximations of Dynamic Programs, I , 1978, Math. Oper. Res..

[25]  Michel Gendreau,et al.  Exploiting Knowledge About Future Demands for Real-Time Vehicle Dispatching , 2006, Transp. Sci..

[26]  Robert L. Smith,et al.  Aggregation in Dynamic Programming , 1987, Oper. Res..

[27]  Shlomo Zilberstein,et al.  Symbolic Generalization for On-line Planning , 2002, UAI.

[28]  Warren B. Powell,et al.  An Approximate Dynamic Programming Algorithm for Large-Scale Fleet Management: A Case Application , 2009, Transp. Sci..

[29]  John N. Tsitsiklis,et al.  Feature-based methods for large scale dynamic programming , 2004, Machine Learning.

[30]  Thomas G. Dietterich,et al.  Efficient Value Function Approximation Using Regression Trees , 1999 .

[31]  R. Tibshirani,et al.  Combining Estimates in Regression and Classification , 1996 .

[32]  Nicola Secomandi,et al.  A Rollout Policy for the Vehicle Routing Problem with Stochastic Demands , 2001, Oper. Res..

[33]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[34]  Yuhong Yang Adaptive Regression by Mixing , 2001 .

[35]  Warren B. Powell,et al.  An Adaptive Dynamic Programming Algorithm for the Heterogeneous Resource Allocation Problem , 2002, Transp. Sci..