Ranking in keyphrase extraction problem: is it suitable to use statistics of words occurrences?

Abstract . The paper deals with keyphrase extraction problem for single documents, e.g. scientific abstracts. Keyphrase extraction task is important and its results could be used in a variety of applications: data indexing, clustering and classification of documents, meta-information extraction, automatic ontologies creation etc. In the paper we discuss an approach to keyphrase extraction, its’ first step is building of candidate phrases which are then ranked and the best are selected as keyphrases. The paper is focused on the evaluation of weighting approaches to candidate phrases in the unsupervised ex-traction methods. A number of in-phrase word weighting procedures is evaluated. Unsuitable approaches to weighting are identified. Testing of some approaches shows their equivalence as applied to keyphrase extraction. A feature, which allows to increase the quality of extracted keyphrases and shows better results in comparison to the state of the art, is proposed. Experiments are based on Inspec dataset.

[1]  Dmitry Mouromtsev,et al.  Sci-Search: Academic Search and Analysis System Based on Keyphrases , 2013, KESW.

[2]  Xiaojun Wan,et al.  Single Document Keyphrase Extraction Using Neighborhood Knowledge , 2008, AAAI.

[3]  Dell Zhang,et al.  Semantic, Hierarchical, Online Clustering of Web Search Results , 2004, APWeb.

[4]  Peter D. Turney Learning to Extract Keyphrases from Text , 2002, ArXiv.

[5]  Vincent Ng,et al.  Conundrums in Unsupervised Keyphrase Extraction: Making Sense of the State-of-the-Art , 2010, COLING.

[6]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[7]  Anette Hulth,et al.  Improved Automatic Keyword Extraction Given More Linguistic Knowledge , 2003, EMNLP.

[8]  Wei-Ying Ma,et al.  Learning to cluster web search results , 2004, SIGIR '04.

[9]  Timothy Baldwin,et al.  Automatic keyphrase extraction from scientific articles , 2013, Lang. Resour. Evaluation.

[10]  Xiaojun Wan,et al.  Exploiting neighborhood knowledge for single document summarization and keyphrase extraction , 2010, TOIS.

[11]  Carl Gutwin,et al.  Domain-Specific Keyphrase Extraction , 1999, IJCAI.

[12]  Ahmed A. Rafea,et al.  KP-Miner: A keyphrase extraction system for English and Arabic documents , 2009, Inf. Syst..

[13]  Попова Светлана Владимировна,et al.  ИЗВЛЕЧЕНИЕ И РАНЖИРОВАНИЕ КЛЮЧЕВЫХ ФРАЗ В ЗАДАЧЕ АННОТИРОВАНИЯ , 2013 .

[14]  David W. Patterson,et al.  Contextual Document Clustering , 2004, ECIR.

[15]  Carl Gutwin,et al.  Improving browsing in digital libraries with keyphrase indexes , 1999, Decis. Support Syst..

[16]  Iraklis Varlamis,et al.  SemanticRank: Ranking Keywords and Sentences Using Semantic Graphs , 2010, COLING.

[17]  Iryna Gurevych,et al.  Approximate Matching for Evaluating Keyphrase Extraction , 2009, RANLP.

[18]  Wei You,et al.  An automatic keyphrase extraction system for scientific documents , 2012, Knowledge and Information Systems.

[19]  Antonina Dattolo,et al.  Automatic keyphrase extraction and ontology mining for content-based tag recommendation , 2010 .

[20]  Rada Mihalcea,et al.  TextRank: Bringing Order into Text , 2004, EMNLP.