Reinforcement Learning for Short-Term Production Scheduling with Sequence-Dependent Setup Waste