Masked Diffusion Models are Fast Learners