Hierarchical decision making in semiconductor fabs using multi-time scale Markov decision processes

There are different timescales of decision making in semiconductor fabs. While decisions on buying/discarding of machines are made on the slower timescale, those that deal with capacity allocation and switchover are made on the faster timescale. We formulate this problem along the lines of a recently developed multi-time scale Markov decision process (MMDP) framework and present numerical experiments wherein we use TD(0) and Q-learning algorithms with linear approximation architecture, and show comparisons of these with the policy iteration algorithm. We show numerical experiments under two different scenarios. In the first, transition probabilities are computed and used in the algorithms. In the second, transitions are simulated without explicitly computing the transition probabilities. We observe that TD(0) requires less computation than Q-learning. Moreover algorithms that use simulated transitions require significantly less computation than their counterparts that compute transition probabilities.