Scientific metadata quality enhancement for scholarly publications

Keyword metadata is very important to the access, retrieval, and management of scientific publications. However, author-assigned keywords are not always readily available in digital repositories. In this study, in order to enhance metadata quality, we explore different automatic methods to infer keywords from scholarly articles, including supervised topic modeling, language model, and mutual information. Evaluation results showed that the linear combination of mutual information and topic modeling with full text outperform other methods on MAP, while language model with abstract performed better than other methods on the measure of precision@10.