Automatic Caption Generation for News Images

AbtractThis thesis is concerned with the task of automatically generating captions for images, which is important for many image related applications. Our model learns to create captions from publicly available dataset that has not been explicitly labelled for our task. A dataset consists of news articles, the pictures embedded in them, and their captions, and consists of two stages. First stage consists of content selection which identifies what the image and accompanying article are about, whereas second stage surface realization determines how to put the chosen content in a proper grammatical caption. For content selection, we are using probabilistic image annotation model that suggests keywords for an image. This model postulates that images and their textual descriptions are generated by a shared set of latent variables (topics) and is trained on a weakly labeled dataset (which treats the captions and associated news articles as image labels). The abstractive surface realization model generates captions that are favorable to human generated captions.