LEARNING ALGORITHMS FOR MARKOV DECISION PROCESSES