Near-Optimal Regret Bounds for Multi-batch Reinforcement Learning