Reinforcement Learning to Rank with Coarse-grained Labels