Collaborative Ranking between Supervised and Unsupervised Approaches for Keyphrase Extraction

Automatic keyphrase extraction methods have generally taken either supervised or unsupervised approaches. Supervised methods extract keyphrases by using a training document set, thus acquiring knowledge from a global collection of texts. Conversely, unsupervised methods extract keyphrases by determining their relevance in a single-document context, without prior learning. We present a hybrid keyphrase extraction method for short articles, HybridRank, which leverages the benefits of both approaches. Our system implements modified versions of the TextRank (Mihalcea and Tarau, 2004)—unsupervised—and KEA (Witten et al., 1999)—supervised—methods, and applies a merging algorithm to produce an overall list of keyphrases. We have tested HybridRank on more than 900 abstracts belonging to a wide variety of subjects, and show its superior effectiveness. We conclude that knowledge collaboration between supervised and unsupervised methods can produce higher-quality keyphrases than applying these methods individually.

[1]  Jing-Song Hu,et al.  Automatic Keyphrases Extraction from Document Using Neural Network , 2005, ICMLC.

[2]  Mitsuru Ishizuka,et al.  Keyword extraction from a single document using word co-occurrence statistical information , 2004, Int. J. Artif. Intell. Tools.

[3]  Taeho Jo,et al.  Keyword Extraction from Documents Using a Neural Network Model , 2006, 2006 International Conference on Hybrid Information Technology.

[4]  Taeho Jo Neural Based Approach to Keyword Extraction from Documents , 2003, ICCSA.

[5]  Carl Gutwin,et al.  KEA: practical automatic keyphrase extraction , 1999, DL '99.

[6]  Carl Gutwin,et al.  Domain-Specific Keyphrase Extraction , 1999, IJCAI.

[7]  Min-Yen Kan,et al.  Keyphrase Extraction in Scientific Publications , 2007, ICADL.

[8]  Hans Peter Luhn,et al.  A Statistical Approach to Mechanized Encoding and Searching of Literary Information , 1957, IBM J. Res. Dev..

[9]  Clement T. Yu,et al.  A theory of term importance in automatic text analysis , 1974, J. Am. Soc. Inf. Sci..

[10]  Jonathan D. Cohen Highlights: language- and domain-independent automatic indexing terms for abstracting , 1995 .

[11]  Xiaojun Wan,et al.  CollabRank: Towards a Collaborative Approach to Single-Document Keyphrase Extraction , 2008, COLING.

[12]  Chengzhi Zhang,et al.  Automatic Keyword Extraction from Documents Using Conditional Random Fields , 2008 .

[13]  Anette Hulth,et al.  Improved Automatic Keyword Extraction Given More Linguistic Knowledge , 2003, EMNLP.

[14]  Joshua Goodman,et al.  Finding advertising keywords on web pages , 2006, WWW '06.

[15]  Xiaojun Wan,et al.  Towards an Iterative Reinforcement Approach for Simultaneous Document Summarization and Keyword Extraction , 2007, ACL.

[16]  Rada Mihalcea,et al.  TextRank: Bringing Order into Text , 2004, EMNLP.

[17]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[18]  Peter D. Turney Coherent Keyphrase Extraction via Web Mining , 2003, IJCAI.

[19]  Peter D. Turney Learning Algorithms for Keyphrase Extraction , 2000, Information Retrieval.

[20]  Mita Nasipuri,et al.  A New Approach to Keyphrase Extraction Using Neural Networks , 2010, ArXiv.