A Distributed Decision-Making Structure for Dynamic Resource Allocation Using Nonlinear Functional Approximations

This paper proposes a distributed solution approach to a certain class of dynamic resource allocation problems and develops a dynamic programming-based multiagent decision-making, learning, and communication mechanism. In the class of dynamic resource allocation problems we consider, a set of reusable resources of different types has to be assigned to tasks that arrive randomly over time. The assignment of a resource to a task removes the task from the system, modifies the state of the resource, and generates a contribution. We build a decision-making scheme where the decisions regarding the resources in different sets of states are made by different agents. We explain how to coordinate the actions of different agents using nonlinear functional approximations, and show that in a distributed setting, nonlinear approximations produce sequences of min-cost network flow problems that naturally yield integer solutions. We also experimentally compare the performances of the centralized and distributed solution strategies.

[1]  Richard S. Sutton,et al.  Temporal credit assignment in reinforcement learning , 1984 .

[2]  Edmund H. Durfee,et al.  Cooperation through communication in a distributed problem-solving network , 1990 .

[3]  Edmund H. Durfee,et al.  Distributed artificial intelligence , 1998 .

[4]  B PowellWarren,et al.  An Adaptive Dynamic Programming Algorithm for Dynamic Fleet Management, II , 2002 .

[5]  Warren B. Powell,et al.  An Adaptive Dynamic Programming Algorithm for Dynamic Fleet Management, I: Single Period Travel Times , 2002, Transp. Sci..

[6]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[7]  P. Schweitzer,et al.  Generalized polynomial approximations in Markovian decision processes , 1985 .

[8]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[9]  Brahim Chaib-draa,et al.  An overview of distributed artificial intelligence , 1996 .

[10]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[11]  Richard Withey The convergence of convergence , 2001, Aslib Proc..

[12]  Tad Hogg,et al.  Spawn: A Distributed Computational Economy , 1992, IEEE Trans. Software Eng..

[13]  Nicholas R. Jennings,et al.  Foundations of distributed artificial intelligence , 1996, Sixth-generation computer technology series.

[14]  Warren B. Powell,et al.  Dynamic-Programming Approximations for Stochastic Time-Staged Integer Multicommodity-Flow Problems , 2006, INFORMS J. Comput..

[15]  Rahul Simha,et al.  A Microeconomic Approach to Optimal Resource Allocation in Distributed Computer Systems , 1989, IEEE Trans. Computers.

[16]  John N. Tsitsiklis,et al.  Analysis of temporal-difference learning with function approximation , 1996, NIPS 1996.

[17]  Daniel D. Corkill,et al.  A framework for organizational self-design in distributed problem solving networks , 1983 .

[18]  Manuela M. Veloso,et al.  Multiagent Systems: A Survey from a Machine Learning Perspective , 2000, Auton. Robots.

[19]  Warrren B Powell,et al.  An Adaptive, Distribution-Free Algorithm for the Newsvendor Problem with Censored Demands, with Applications to Inventory and Distribution , 2001 .

[20]  Peter Dayan,et al.  The convergence of TD(λ) for general λ , 1992, Machine Learning.

[21]  R. Wets,et al.  L-SHAPED LINEAR PROGRAMS WITH APPLICATIONS TO OPTIMAL CONTROL AND STOCHASTIC PROGRAMMING. , 1969 .

[22]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[23]  Alan H. Bond,et al.  Readings in Distributed Artificial Intelligence , 1988 .

[24]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[25]  Alan H. Bond,et al.  Distributed Artificial Intelligence , 1988 .

[26]  Pat Langley,et al.  Elements of Machine Learning , 1995 .

[27]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[28]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[29]  Benjamin Van Roy,et al.  The Linear Programming Approach to Approximate Dynamic Programming , 2003, Oper. Res..