Compiling Keyphrase Candidates for Scientific Literature Based on Wikipedia

Keyphrase candidate compilation is a crucial step for both supervised and unsupervised keyphrase extractors. The traditional methods are usually based on the lexical or frequency properties of the phrases to come up the list. However, terms collected based on these properties do not always semantically meaningful. We show that Wikipedia can be a great auxiliary resource to compile meaningful keyphrase candidates for scientific literature. We conducted empirical experiments on digital libraries of two disciplines, namely Computer Science and Chemistry. The results suggest that Wikipedia has a good coverage of the two disciplines and has the potential to be applied to other scientific disciplines.