VGDiffZero: Text-to-image Diffusion Models Can Be Zero-shot Visual Grounders