Hierarchical decoding with latent context for image captioning