Self learning control of constrained Markov chains - a gradient approach