论文信息 - Keyphrase Extraction in Scientific Publications

Keyphrase Extraction in Scientific Publications

We present a keyphrase extraction algorithm for scientific publications. Different from previous work, we introduce features that capture the positions of phrases in document with respect to logical sections found in scientific discourse. We also introduce features that capture salient morphological phenomena found in scientific keyphrases, such as whether a candidate keyphrase is an acronyms or uses specific terminologically productive suffixes. We have implemented these features on top of a baseline feature set used by Kea [1]. In our evaluation using a corpus of 120 scientific publications multiply annotated for keyphrases, our system significantly outperformed Kea at the p < .05 level. As we know of no other existing multiply annotated keyphrase document collections, we have also made our evaluation corpus publicly available. We hope that this contribution will spur future comparative research.

Min-Yen Kan | Thuy Dung Nguyen | Min-Yen Kan | T. Nguyen

[1] W. John Wilbur,et al. Corpus-based statistical screening for content-bearing terms , 2001, J. Assoc. Inf. Sci. Technol..

[2] Christopher D. Manning,et al. Enriching the Knowledge Sources Used in a Maximum Entropy Part-of-Speech Tagger , 2000, EMNLP.

[3] Ken Barker,et al. Using Noun Phrase Heads to Extract Document Keyphrases , 2000, Canadian Conference on AI.

[4] Peter D. Turney. Coherent Keyphrase Extraction via Web Mining , 2003, IJCAI.

[5] Bruno Pouliquen,et al. Automatic annotation of multilingual text collections with a conceptual thesaurus , 2006, ArXiv.

[6] Nguyen Thuy Dung,et al. Automatic Keyphrase Generation , 2007 .

[7] Matthew Hurst,et al. A Language Model Approach to Keyphrase Extraction , 2003, ACL 2003.

[8] Yi-fang Brook Wu,et al. Domain-specific keyphrase extraction , 2005, CIKM '05.

[9] William R. Hersh,et al. Proceedings of the 2nd ACM/IEEE-CS joint conference on Digital libraries , 2002 .

[10] Gordon W. Paynter,et al. Human evaluation of Kea, an automatic keyphrasing system , 2001, JCDL '01.

[11] Ian H. Witten,et al. Thesaurus based automatic keyphrase indexing , 2006, Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '06).

[12] Ian Witten,et al. Data Mining , 2000 .

[13] Susan T. Dumais,et al. Inductive learning algorithms and representations for text categorization , 1998, CIKM '98.

[14] Peter D. Turney. Learning to Extract Keyphrases from Text , 2002, ArXiv.