Directly Fine-Tuning Diffusion Models on Differentiable Rewards