Automatic Keyphrase Extractor from Arabic Documents

The keyphrase is a sentence or a part of a sentence that contains a sequence of words that expresses the meaning and the purpose of any given paragraph. Keyphrase extraction is the task of identifying the possible keyphrases from a given document. Many applications including text summarization, indexing, and characterization use keyphrase extraction. Also, it is an essential task to improve the performance of any information retrieval system. The internet contains a massive amount of documents that may have been manually assigned keyphrases or not. The Arabic language is an important language in the world. Nowadays the number of online Arabic documents is growing rapidly; and most of them have no manually assigned keyphrases, so the user will scan the whole retrieved web documents. To avoid scanning the entire retrieved document, we need keyphrases assigned to each web document manually or automatically. This paper addresses the problem of identifying keyphrases in Arabic documents automatically. In this work, we provide a novel algorithm that identified keyphrases from Arabic text. The new algorithm, Automatic Keyphrases Extraction from Arabic (AKEA), extracts keyphrases from Arabic documents automatically. In order to test the algorithm, we collected a dataset containing 100 documents from Arabic wiki; also, we downloaded another 56 agricultural documents from Food and Agricultural Organization of the United Nations (F.A.O.). The evaluation results show that the system achieves 83% precision value in identifying 2-word and 3-word keyphrases from agricultural domains.

[1]  Yi-fang Brook Wu,et al.  Automatically Finding Significant Topical Terms from Documents , 2005, AMCIS.

[2]  Ian H. Witten,et al.  Human-competitive tagging using automatic keyphrase extraction , 2009, EMNLP.

[3]  Xin Jiang,et al.  A ranking approach to keyphrase extraction , 2009, SIGIR.

[4]  Peter D. Turney Coherent Keyphrase Extraction via Web Mining , 2003, IJCAI.

[5]  Peter D. Turney Learning Algorithms for Keyphrase Extraction , 2000, Information Retrieval.

[6]  Yang Song,et al.  Topical Keyphrase Extraction from Twitter , 2011, ACL.

[7]  Zhiyuan Liu,et al.  Automatic Keyphrase Extraction via Topic Decomposition , 2010, EMNLP.

[8]  Xiaojun Wan,et al.  Exploiting neighborhood knowledge for single document summarization and keyphrase extraction , 2010, TOIS.

[9]  Zhiyuan Liu,et al.  Clustering to Find Exemplar Terms for Keyphrase Extraction , 2009, EMNLP.

[10]  Matthew Hurst,et al.  A Language Model Approach to Keyphrase Extraction , 2003, ACL 2003.

[11]  Ahmed A. Rafea,et al.  KP-Miner: A keyphrase extraction system for English and Arabic documents , 2009, Inf. Syst..

[12]  J. B. Keith Humphreys PhraseRate : An HTML Keyphrase Extractor ∗ , 2002 .

[13]  Carl Gutwin,et al.  KEA: practical automatic keyphrase extraction , 1999, DL '99.

[14]  Min Song,et al.  KPSpotter: a flexible information gain-based keyphrase extraction system , 2003, WIDM '03.

[15]  Tarek El-Shishtawy,et al.  Arabic Keyphrase Extraction using Linguistic knowledge and Machine Learning Techniques , 2012, ArXiv.