Simple statistical gradient-following algorithms for connectionist reinforcement learning