Sample-Efficient, Exploration-Based Policy Optimisation for Routing Problems