Reasoning like Humans: On Dynamic Attention Prior in Image Captioning