Machine Learning for Real-Time Decision Making

Abstract : Many problems of interest to the Air Force involve routine sequential decision making under uncertainty. Examples include air traffic control, control of autonomous surveillance aircraft, logistics planning and scheduling, and equipment diagnosis and repair. These kinds of problems can be formulated within the framework of Markov Decision Problems (MDPs) and Partially-Observable Markov Decision Problems (POMDPs). Reinforcement Learning is the study of adaptive methods for solving large MDPs and POMDPs. The research funded under this grant developed a hierarchical approach to solving MDPs, called the MAXQ method, that is much more effective than previous non-hierarchical methods. Theoretical analysis proves that MAXQ converges to the optimal solution. Experimental studies show that it gives very large speedups during learning. A second line of research developed two methods for approximately solving large POMDPs. This research also explored cost-sensitive learning and diagnosis by formulating them as POMDPs and applying specialized reinforcement learning methods to solve them. A third line of research focused on function approximation methods and algorithms for practical reinforcement learning. New representations (based on regression trees and support vector machines) and new algorithms (based on more appropriate objective functions) led to improvements in the quality of solutions and the practical application of reinforcement learning to resource-constrained scheduling problems.