Video-guided machine translation via dual-level back-translation