An algorithm for solving semi-markov decision problems using reinforcement learning: convergence analysis and numerical results