Non-Autoregressive Cross-Modal Coherence Modelling