Non-Markovian Reward Modelling from Trajectory Labels via Interpretable Multiple Instance Learning