Latent keyphrase generation by combining contextually similar primitive words

As the number of document resources is continuously increasing, automatically extracting keyphrases from a document becomes one of the main issues in recent days. However, most previous work overlook keyphrases which is nonexistent in the document. Although latent keyphrases do not appear in the document, it can be important because it represents the meaningful concepts or keypoints of the document. We have discovered that the portion of latent keyphrases is more than one fourth of the entire keyphrases. Latent keyphrases also take an important role as much as existential keyphrases in documents. In this paper, we propose an approach to find latent keyphrases of a document in a given document set. The main idea of this approach is to generate keyphrase by choosing primitive words and combining them considering their context in documents. Experiment result shows that latent keyphrase can be extracted by our approach.

[1]  Wei You,et al.  An automatic keyphrase extraction system for scientific documents , 2012, Knowledge and Information Systems.

[2]  Keon-Myung Lee,et al.  Hierarchical partition of nonstructured concurrent systems , 1997, IEEE Trans. Syst. Man Cybern. Part B.

[3]  Christopher D. Manning,et al.  Enriching the Knowledge Sources Used in a Maximum Entropy Part-of-Speech Tagger , 2000, EMNLP.

[4]  Rada Mihalcea,et al.  TextRank: Bringing Order into Text , 2004, EMNLP.

[5]  Mitsuru Ishizuka,et al.  Keyword extraction from a single document using word co-occurrence statistical information , 2004, Int. J. Artif. Intell. Tools.

[6]  Timothy Baldwin,et al.  SemEval-2010 Task 5 : Automatic Keyphrase Extraction from Scientific Articles , 2010, *SEMEVAL.

[7]  Ilyas Cicekli,et al.  Using lexical chains for keyword extraction , 2007, Inf. Process. Manag..

[8]  Karen Spärck Jones A statistical interpretation of term specificity and its application in retrieval , 2021, J. Documentation.

[9]  Anette Hulth,et al.  Improved Automatic Keyword Extraction Given More Linguistic Knowledge , 2003, EMNLP.

[10]  Seong-Yoon Shin,et al.  Keyword Extraction in Korean Using Unsupervised Learning Method , 2010 .

[11]  Jee-Hyong Lee,et al.  Keyword extraction for blogs based on content richness , 2014, J. Inf. Sci..

[12]  M. F. Porter,et al.  An algorithm for suffix stripping , 1997 .

[13]  Maurizio Marchese,et al.  Large Dataset for Keyphrases Extraction , 2009 .

[14]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[15]  Kyung Mi Lee,et al.  Statistical cluster validity indexes to consider cohesion and separation , 2012, 2012 International conference on Fuzzy Theory and Its Applications (iFUZZY2012).

[16]  Xiaojun Wan,et al.  Single Document Keyphrase Extraction Using Neighborhood Knowledge , 2008, AAAI.

[17]  Zhiyuan Liu,et al.  Automatic Keyphrase Extraction via Topic Decomposition , 2010, EMNLP.

[18]  Kyung Mi Lee,et al.  A Locality Sensitive Hashing Technique for Categorical Data , 2012 .

[19]  이양원,et al.  비감독 학습 기법에 의한 한국어의 키워드 추출 , 2010 .

[20]  Jee-Hyong Lee,et al.  Implementation of Ontology Based Context-Awareness Framework for Ubiquitous Environment , 2007, 2007 International Conference on Multimedia and Ubiquitous Engineering (MUE'07).