An Adaptive Optimal Controller for Discrete-Time Markov Environments