Inverse Reinforcement Q-Learning Through Expert Imitation for Discrete-Time Systems