Figure Captioning in Scholarly Literatures to Augment Search Results

Figures convey useful information, such as trends, proportions, and values, in a concise format. People can understand these attributes at a glance, but machine process them difficultly. When searching for figures, the end-user is presented with the caption that does not contain enough information to interpret the figure. In the paper, we propose a novel end-to-end framework for scholarly figure captioning. In the figure parsing module, figures are localized, classified, and analyzed. The plotted data and its association with the legend entries are extracted. In text processing module, the figure-related sentences are identified and measured with the sentence’s relevance to the figure. The sentence subset with the optimum size is selected considering a balance between information content and the size of the generated caption. The final complete captions enable a variety of current exciting applications, such as figure search engine and figure query answering. Empirical experiments show that our proposed framework can effectively generate captions for figures under several metrics.