A Reinforcement Learning Method for Train Marshaling Based on Movements of Locomotive

In this paper a new reinforcement learning system for generating marshaling plan of freight cars in a train is designed. In the proposed method, the total transfer distance of a locomotive is minimized to obtain the desired layout of freight cars for an outbound train. The order of movements of freight cars, the position for each removed car, the layout of cars in a train and the number of cars to be moved are simultaneously optimized to achieve the desired layout of an outbound train. Initially, freight cars are located in a freight yard by the random layout, and they are moved and lined into a main track in a certain desired order in order to assemble an out bound train. A layout and movements of freight cars are used to describe a state of marshaling yard, and the state transitions are defined based on the Markov Decision Process (MDP). Q-Learning is applied to reflect the transfer distance as well as the number of movements of the locomotive that are used to achieve one of the desired layouts into evaluation values. After adequate autonomous learning, the optimum schedule can be obtained by selecting a series of movements of freight cars that has the best evaluation.