Monte-Carlo tree search in production management problems

Classical search algorithms rely on the existence of a sufficiently powerful evaluation function for non-terminal states. In many task domains, the development of such an evaluation function requires substantial effort and domain knowledge, or is not even possible. As an alternative in recent years, Monte-Carlo evaluation has been succesfully applied in such task domains. In this paper, we apply a search algorithm based on Monte-Carlo evaluation, MonteCarlo Tree Search, in the task domain of production management problems. These can be defined as single-agent problems which consist of selecting a sequence of actions with side effects, leading to high quantities of one or more goal products. They are challenging and can be constructed with highly variable difficulty. Earlier research yielded an offline learning algorithm that leads to good solutions, but requires a long time to run. We show that Monte-Carlo Tree Search leads to a solution in a shorter period of time than this algorithm, with improved solutions for large problems. Our findings can be generalized to other task domains.