Understanding images with natural sentences

We propose a novel system which generates sentential captions for general images. For people to use numerous images effectively on the web, technologies must be able to explain image contents and must be capable of searching for data that users need. Moreover, images must be described with natural sentences based not only on the names of objects contained in an image but also on their mutual relations. The proposed system uses general images and captions available on the web as training data to generate captions for new images. Furthermore, because the learning cost is independent from the amount of data, the system has scalability, which makes it useful with large-scale data.