Non-Autoregressive Coarse-to-Fine Video Captioning