Reinforcement learning and adaptive optimization of a class of Markov jump systems with completely unknown dynamic information