Automatic Documents Annotation by Keyphrase Extraction in Digital Libraries Using Taxonomy

Keyphrases are useful for variety of purposes including: text clustering, classification, content-based retrieval, and automatic text summarization. A small amount of documents have author-assigned keyphrases. Manual assignment of the keyphrases to existing documents is a tedious task, therefore, automatic keyphrase extraction has been extensively used to organize documents. Existing automatic keyphrase extraction algorithms are limited in assigning semantically relevant keyphrases to documents. In this paper we have proposed a methodology to assign keyphrases to digital documents. Our approach exploits semantic relationships and hierarchical structure of the classification scheme to filter out irrelevant keyphrases suggested by Keyphrase Extraction Algorithm (KEA++). Experiments demonstrate that the refinement improves the precision of extracted keyphrases from 0.19% to 0.38% while maintains the same recall.

[1]  Gordon W. Paynter,et al.  Automatic extraction of document keyphrases for use in digital libraries: Evaluation and applications , 2002, J. Assoc. Inf. Sci. Technol..

[2]  Peter D. Turney Coherent Keyphrase Extraction via Web Mining , 2003, IJCAI.

[3]  Geoffrey Sampson,et al.  The Oxford Handbook of Computational Linguistics , 2003, Lit. Linguistic Comput..

[4]  Peter D. Turney Learning Algorithms for Keyphrase Extraction , 2000, Information Retrieval.

[5]  Ian H. Witten,et al.  Thesaurus-based index term extraction for agricultural documents , 2005 .

[6]  Sharifullah Khan,et al.  Refinement Methodology for Automatic Document Alignment Using Taxonomy in Digital Libraries , 2009, 2009 IEEE International Conference on Semantic Computing.

[7]  Anette Hulth,et al.  Improved Automatic Keyword Extraction Given More Linguistic Knowledge , 2003, EMNLP.

[8]  Zhiyuan Liu,et al.  Clustering to Find Exemplar Terms for Keyphrase Extraction , 2009, EMNLP.

[9]  Ken Barker,et al.  Using Noun Phrase Heads to Extract Document Keyphrases , 2000, Canadian Conference on AI.

[10]  Mireya Tovar,et al.  BUAP: An Unsupervised Approach to Automatic Keyphrase Extraction from Scientific Articles , 2010, SemEval@ACL.

[11]  W. J. Black,et al.  A three-pronged approach to the extraction of key terms and semantic roles , 2003 .

[12]  Ahmed A. Rafea,et al.  KP-Miner: A keyphrase extraction system for English and Arabic documents , 2009, Inf. Syst..

[13]  Christian Jacquemin,et al.  Term Extraction and Automatic Indexing , 2005 .

[14]  Carl Gutwin,et al.  KEA: practical automatic keyphrase extraction , 1999, DL '99.

[15]  Carl Gutwin,et al.  Domain-Specific Keyphrase Extraction , 1999, IJCAI.

[16]  Ian H. Witten,et al.  Thesaurus based automatic keyphrase indexing , 2006, Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '06).

[17]  Min-Yen Kan,et al.  Re-examining Automatic Keyphrase Extraction Approaches in Scientific Articles , 2009, MWE@IJCNLP.