论文信息 - ADAPTIVE RESOURCE CONTROL Machine Learning Approaches to Resource Allocation in Uncertain and Changing Environments

ADAPTIVE RESOURCE CONTROL Machine Learning Approaches to Resource Allocation in Uncertain and Changing Environments

The dissertation aims at studying resource allocation problems (RAPs) in uncertain and changing environments. In order to do this, first a brief introduction to the motivations and classical RAPs is given in Chapter 1, followed by a section on Markov decision processes (MDPs) which constitute the basis of the approach. The core of the thesis consists of two parts, the first deals with uncertainties, namely, with stochastic RAPs, while the second studies the effects of changes in the environmental dynamics on learning algorithms. Chapter 2, the first core part, investigates stochastic RAPs with scarce, reusable resources and non-preemtive, interconnected tasks having temporal extensions. These RAPs are natural generalizations of several standard resource management problems, such as scheduling and transportation ones. First, reactive solutions are considered and defined as policies of suitably reformulated MDPs. It is highlighted that this reformulation has several favorable properties, such as it has finite state and action spaces, it is aperiodic, hence all policies are proper and the space of policies can be safely restricted. Proactive solutions are also proposed and defined as policies of special partially observable MDPs. Next, reinforcement learning (RL) methods, such as fitted Q-learning, are suggested for computing a policy. In order to compactly maintain the value function, two representations are studied: hash tables and support vector regression (SVR), particularly, ν-SVRs. Several additional improvements, such as the application of rollout algorithms in the initial phases, action space decomposition, task clustering and distributed sampling are investigated, as well. Chapter 3, the second core part, studies the possibility of applying value function based RL methods in cases when the environment may change over time. First, theorems are presented which show that the optimal value function and the value function of a fixed control policy Lipschitz continuously depend on the immediate-cost function and the transitionprobability function, assuming a discounted MDP. Dependence on the discount factor is also analyzed and shown to be non-Lipschitz. Afterwards, the concept of (ε, δ)-MDPs is introduced, which is a generalization of MDPs and ε-MDPs. In this model the transitionprobability function and the immediate-cost function may vary over time, but the changes must be asymptotically bounded. Then, learning in changing environments is investigated. A general relaxed convergence theorem for stochastic iterative algorithms is presented and illustrated through three classical examples: value iteration, Q-learning and TD-learning. Finally, in Chapter 4, results of numerical experiments on both benchmark and industryrelated problems are shown. The effectiveness of the proposed adaptive resource allocation approach as well as learning in presence of disturbances and changes are demonstrated.

Balázs Csanád Csáji

[1] Balázs Csanád Csáji,et al. On the Automation of Similarity Information Maintenance in Flexible Query Answering Systems , 2004, DEXA.

[2] László Monostori,et al. Stochastic Reactive Production Scheduling by Multi-agent Based Asynchronous Approximate Dynamic Programming , 2005, CEEMAS.

[3] Tommi S. Jaakkola,et al. Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms , 2000, Machine Learning.

[4] László Monostori,et al. Production structures as complex adaptive systems , 2007 .

[5] Dimitri P. Bertsekas,et al. Dynamic Programming and Suboptimal Control: A Survey from ADP to MPC , 2005, Eur. J. Control.

[6] Edward J. Sondik,et al. The Optimal Control of Partially Observable Markov Processes over a Finite Horizon , 1973, Oper. Res..

[7] Botond Kádár,et al. The role of adaptive agents in distributed manufacturing , 2002 .

[8] A. M. Turing,et al. Computing Machinery and Intelligence , 1950, The Philosophy of Artificial Intelligence.

[9] E. J. Sondik,et al. The Optimal Control of Partially Observable Markov Decision Processes. , 1971 .

[10] Eduardo D. Sontag,et al. Mathematical Control Theory: Deterministic Finite Dimensional Systems , 1990 .

[11] Wei-Min Shen,et al. Dynamic Distributed Resource Allocation: A Distributed Constraint Satisfaction Approach , 2001, ATAL.