A New Domain Independent Keyphrase Extraction System

In this paper we present a keyphrase extraction system that can extract potential phrases from a single document in an unsupervised, domain-independent way. We extract word n-grams from input document. We incorporate linguistic knowledge (i.e., part-of-speech tags), and statistical information (i.e., frequency, position, lifespan) of each n-gram in defining candidate phrases and their respective feature sets. The proposed approach can be applied to any document, however, in order to know the effectiveness of the system for digital libraries, we have carried out the evaluation on a set of scientific documents, and compared our results with current keyphrase extraction systems.

[1]  Min-Yen Kan,et al.  Keyphrase Extraction in Scientific Publications , 2007, ICADL.

[2]  Xiaojun Wan,et al.  Single Document Keyphrase Extraction Using Neighborhood Knowledge , 2008, AAAI.

[3]  Peter D. Turney Learning Algorithms for Keyphrase Extraction , 2000, Information Retrieval.

[4]  Antonina Dattolo,et al.  Recommending New Tags Using Domain-Ontologies , 2009, 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology.

[5]  Peter Willett,et al.  Readings in information retrieval , 1997 .

[6]  Ken Barker,et al.  Using Noun Phrase Heads to Extract Document Keyphrases , 2000, Canadian Conference on AI.

[7]  Ian H. Witten,et al.  Human-competitive tagging using automatic keyphrase extraction , 2009, EMNLP.

[8]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[9]  Mohamed S. Kamel,et al.  CorePhrase: Keyphrase Extraction for Document Clustering , 2005, MLDM.

[10]  Vibhu O. Mittal,et al.  OCELOT: a system for summarizing Web pages , 2000, SIGIR '00.

[11]  Antonina Dattolo,et al.  Supporting Personalized User Concept Spaces and Recommendations for a Publication Sharing System , 2009, UMAP.

[12]  William R. Hersh,et al.  Proceedings of the 2nd ACM/IEEE-CS joint conference on Digital libraries , 2002 .

[13]  B. Magnini,et al.  Keyphrase Extraction for Summarization Purposes : The LAKE System at DUC-2004 , 2004 .

[14]  Antonina Dattolo,et al.  A General Framework for Personalized Text Classification and Annotation , 2009, AP WEB 2.0@UMAP.

[15]  Mark Last,et al.  Graph-Based Keyword Extraction for Single-Document Summarization , 2008, COLING 2008.

[16]  Anette Hulth,et al.  Improved Automatic Keyword Extraction Given More Linguistic Knowledge , 2003, EMNLP.

[17]  Edward A. Fox,et al.  Proceedings of the Fourth ACM conference on Digital Libraries, August 11-14, 1999, Berkeley, CA, USA , 1999 .

[18]  Min Song,et al.  Keyphrase extraction-based query expansion in digital libraries , 2006, Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '06).

[19]  Dana J. Vanier,et al.  Use of Keyphrase Extraction Software for Creation of an AEC/FM Thesaurus , 2000, J. Inf. Technol. Constr..

[20]  Zhiyuan Liu,et al.  Clustering to Find Exemplar Terms for Keyphrase Extraction , 2009, EMNLP.

[21]  Sivaji Bandyopadhyay,et al.  Proceedings of the Workshop on Multi-source Multilingual Information Extraction and Summarization , 2008 .

[22]  Slava M. Katz,et al.  Technical terminology: some linguistic properties and an algorithm for identification in text , 1995, Natural Language Engineering.

[23]  K. Srinathan,et al.  Automatic keyphrase extraction from scientific documents using N-gram filtration technique , 2008, ACM Symposium on Document Engineering.

[24]  Yi-fang Brook Wu,et al.  Document keyphrases as subject metadata: incorporating document key concepts in search results , 2008, Information Retrieval.

[25]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[26]  Mirella Lapata,et al.  Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, EMNLP 2009, 6-7 August 2009, Singapore, A meeting of SIGDAT, a Special Interest Group of the ACL , 2009, EMNLP.

[27]  Carl Gutwin,et al.  KEA: practical automatic keyphrase extraction , 1999, DL '99.

[28]  Carl Gutwin,et al.  Domain-Specific Keyphrase Extraction , 1999, IJCAI.

[29]  F. Ren,et al.  Multilingual single document keyword extraction for information retrieval , 2005, 2005 International Conference on Natural Language Processing and Knowledge Engineering.

[30]  Bruce Krulwich,et al.  Learning user information interests through extraction of semantically significant phrases , 1996 .