Latent Keyphrase Extraction Using LDA Model

As the number of document resources is continuously increasing, automatically extracting keyphrases from a document becomes one of the main issues in recent days. However, most previous works have tried to extract keyphrases from words in documents, so they overlooked latent keyphrases which did not appear in documents. Although latent keyphrases do not appear in documents, they can undertake an important role in text summarization and information retrieval because they implicate meaningful concepts or contents of documents. Also, they cover more than one fourth of the entire keyphrases in the real-world datasets and they can be utilized in short articles such as SNS which rarely have explicit keyphrases. In this paper, we propose a new approach that selects candidate keyphrases from the keyphrases of neighbor documents which are similar to the given document and evaluates the importance of the candidates with the individual words in the candidates. Experiment result shows that latent keyphrases can be extracted at a reasonable level.

[1]  Chengzhi Zhang,et al.  Automatic Keyword Extraction from Documents Using Conditional Random Fields , 2008 .

[2]  Mitsuru Ishizuka,et al.  Keyword extraction from a single document using word co-occurrence statistical information , 2004, Int. J. Artif. Intell. Tools.

[3]  Jee-Hyong Lee,et al.  A Study on Graph-based Topic Extraction from Microblogs , 2011 .

[4]  Carl Gutwin,et al.  Domain-Specific Keyphrase Extraction , 1999, IJCAI.

[5]  Timothy Baldwin,et al.  SemEval-2010 Task 5 : Automatic Keyphrase Extraction from Scientific Articles , 2010, *SEMEVAL.

[6]  Juan-Zi Li,et al.  Keyword Extraction Using Support Vector Machine , 2006, WAIM.

[7]  Rada Mihalcea,et al.  TextRank: Bringing Order into Text , 2004, EMNLP.

[8]  Karen Spärck Jones A statistical interpretation of term specificity and its application in retrieval , 2021, J. Documentation.

[9]  Anette Hulth,et al.  Improved Automatic Keyword Extraction Given More Linguistic Knowledge , 2003, EMNLP.

[10]  Xiaojun Wan,et al.  Single Document Keyphrase Extraction Using Neighborhood Knowledge , 2008, AAAI.

[11]  Saïd Abdeddaïm,et al.  Accurate keyphrase extraction by discriminating overlapping phrases , 2014, J. Inf. Sci..

[12]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[13]  Zhiyuan Liu,et al.  Automatic Keyphrase Extraction via Topic Decomposition , 2010, EMNLP.

[14]  Jee-Hyong Lee,et al.  Latent keyphrase generation by combining contextually similar primitive words , 2014, 2014 Joint 7th International Conference on Soft Computing and Intelligent Systems (SCIS) and 15th International Symposium on Advanced Intelligent Systems (ISIS).

[15]  Jee-Hyong Lee,et al.  Keyword extraction for blogs based on content richness , 2014, J. Inf. Sci..

[16]  Wei You,et al.  An automatic keyphrase extraction system for scientific documents , 2012, Knowledge and Information Systems.

[17]  Mingyu Kim,et al.  A Methodology for Extracting Shopping-Related Keywords by Analyzing Internet Navigation Patterns , 2014 .

[18]  Maurizio Marchese,et al.  Large Dataset for Keyphrases Extraction , 2009 .