Solving Reward-Collecting Problems with UAVs: A Comparison of Online Optimization and Q-Learning