KP-Miner: A keyphrase extraction system for English and Arabic documents

Automatic keyphrase extraction has many important applications including but not limited to summarization, cataloging/indexing, feature extraction for clustering and classification, and data mining. This paper presents the KP-Miner system, and demonstrates through experimentation and comparison with widely used systems that it is effective and efficient in extracting keyphrases from both English and Arabic documents of varied length. Unlike other existing keyphrase extraction systems, the KP-Miner system does not need to be trained on a particular document set in order to achieve its task. It also has the advantage of being configurable as the rules and heuristics adopted by the system are related to the general nature of documents and keyphrases. This implies that the users of this system can use their understanding of the document(s) being input into the system to fine-tune it to their particular needs.

[1]  Bruce Krulwich,et al.  Learning user information interests through extraction of semantically significant phrases , 1996 .

[2]  M. F. Porter,et al.  An algorithm for suffix stripping , 1997 .

[3]  Mo Chen,et al.  A practical system of keyphrase extraction for web pages , 2005, CIKM '05.

[4]  Peter D. Turney Learning to Extract Keyphrases from Text , 2002, ArXiv.

[5]  B. Magnini,et al.  Keyphrase Extraction for Summarization Purposes : The LAKE System at DUC-2004 , 2004 .

[6]  Saturnino Luz,et al.  Automatic Hypertext Keyphrase Detection , 2005, IJCAI.

[7]  S.R. El-Beltagy,et al.  KP-Miner: A Simple System for Effective Keyphrase Extraction , 2006, 2006 Innovations in Information Technology.

[8]  Anette Hulth,et al.  Improved Automatic Keyword Extraction Given More Linguistic Knowledge , 2003, EMNLP.

[9]  Julie Beth Lovins,et al.  Development of a stemming algorithm , 1968, Mech. Transl. Comput. Linguistics.

[10]  Carl Gutwin,et al.  KEA: practical automatic keyphrase extraction , 1999, DL '99.

[11]  Lisa Ballesteros,et al.  Improving stemming for Arabic information retrieval: light stemming and co-occurrence analysis , 2002, SIGIR '02.

[12]  SaltonGerard,et al.  Term-weighting approaches in automatic text retrieval , 1988 .

[13]  Peter D. Turney Coherent Keyphrase Extraction via Web Mining , 2003, IJCAI.

[14]  Zhi Zhou,et al.  Keyphrase Extraction Using Semantic Networks Structure Analysis , 2006, Sixth International Conference on Data Mining (ICDM'06).

[15]  Peter D. Turney Learning Algorithms for Keyphrase Extraction , 2000, Information Retrieval.

[16]  Anette Hulth,et al.  A Study on Automatically Extracted Keywords in Text Categorization , 2006, ACL.