P3VI: a partitioned, prioritized, parallel value iterator

We present an examination of the state-of-the-art for using value iteration to solve large-scale discrete Markov Decision Processes. We introduce an architecture which combines three independent performance enhancements (the intelligent prioritization of computation, state partitioning, and massively parallel processing) into a single algorithm. We show that each idea improves performance in a different way, meaning that algorithm designers do not have to trade one improvement for another. We give special attention to parallelization issues, discussing how to efficiently partition states, distribute partitions to processors, minimize message passing and ensure high scalability. We present experimental results which demonstrate that this approach solves large problems in reasonable time.