Attention Models for Image and Video Caption Generation