Estimating Keyphrases Popularity in Sampling Collections

The problem of structured representation of data has high practical value and is particularly relevant due to growth of data volume. Such means of data representation as topic graphs, concepts trees, etc. is a convenient way to represent information retrieved from a collection of documents. In this paper, we research some aspects of using collection of samples for evaluation popularity of concepts. The last can be used to visualize concept significance and concept ranking in the tasks of structured representation.

[1]  Tim Kraska,et al.  A sample-and-clean framework for fast and accurate query processing on dirty data , 2014, SIGMOD Conference.

[2]  Peter D. Turney Learning to Extract Keyphrases from Text , 2002, ArXiv.

[3]  Rada Mihalcea,et al.  TextRank: Bringing Order into Text , 2004, EMNLP.

[4]  Xiaojun Wan,et al.  Exploiting neighborhood knowledge for single document summarization and keyphrase extraction , 2010, TOIS.

[5]  Surajit Chaudhuri,et al.  Dynamic sample selection for approximate query processing , 2003, SIGMOD '03.

[6]  Claudio Carpineto,et al.  Full-Subtopic Retrieval with Keyphrase-Based Search Results Clustering , 2009, 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology.

[7]  Anette Hulth,et al.  Improved Automatic Keyword Extraction Given More Linguistic Knowledge , 2003, EMNLP.

[8]  Mong-Li Lee,et al.  ICICLES: Self-Tuning Samples for Approximate Query Answering , 2000, VLDB.

[9]  Ahmed A. Rafea,et al.  KP-Miner: A keyphrase extraction system for English and Arabic documents , 2009, Inf. Syst..

[10]  Andrea Passerini,et al.  Navigating the topical structure of academic search results via the Wikipedia category network , 2013, CIKM.

[11]  Dell Zhang,et al.  Semantic, Hierarchical, Online Clustering of Web Search Results , 2004, APWeb.

[12]  Timothy Baldwin,et al.  Automatic keyphrase extraction from scientific articles , 2013, Lang. Resour. Evaluation.

[13]  Dmitry Mouromtsev,et al.  Stop-words in keyphrase extraction problem , 2013, 14th Conference of Open Innovation Association FRUCT.

[14]  Andrea Marino,et al.  Topical clustering of search results , 2012, WSDM '12.

[15]  Iryna Gurevych,et al.  Approximate Matching for Evaluating Keyphrase Extraction , 2009, RANLP.

[16]  Wei You,et al.  An automatic keyphrase extraction system for scientific documents , 2012, Knowledge and Information Systems.

[17]  James Allan,et al.  Extracting query facets from search results , 2013, SIGIR.

[18]  I. A. Khodyrev,et al.  Ranking in keyphrase extraction problem: is it suitable to use statistics of words occurrences? , 2014 .

[19]  Wei-Ying Ma,et al.  Learning to cluster web search results , 2004, SIGIR '04.