Robust Anytime Learning of Markov Decision Processes