论文信息 - Collaborative Ranking between Supervised and Unsupervised Approaches for Keyphrase Extraction

Collaborative Ranking between Supervised and Unsupervised Approaches for Keyphrase Extraction

Automatic keyphrase extraction methods have generally taken either supervised or unsupervised approaches. Supervised methods extract keyphrases by using a training document set, thus acquiring knowledge from a global collection of texts. Conversely, unsupervised methods extract keyphrases by determining their relevance in a single-document context, without prior learning. We present a hybrid keyphrase extraction method for short articles, HybridRank, which leverages the benefits of both approaches. Our system implements modified versions of the TextRank (Mihalcea and Tarau, 2004)—unsupervised—and KEA (Witten et al., 1999)—supervised—methods, and applies a merging algorithm to produce an overall list of keyphrases. We have tested HybridRank on more than 900 abstracts belonging to a wide variety of subjects, and show its superior effectiveness. We conclude that knowledge collaboration between supervised and unsupervised methods can produce higher-quality keyphrases than applying these methods individually.

Yi-Shin Chen | Gerardo Figueroa

[1] Jing-Song Hu,et al. Automatic Keyphrases Extraction from Document Using Neural Network , 2005, ICMLC.

[2] Mitsuru Ishizuka,et al. Keyword extraction from a single document using word co-occurrence statistical information , 2004, Int. J. Artif. Intell. Tools.

[3] Taeho Jo,et al. Keyword Extraction from Documents Using a Neural Network Model , 2006, 2006 International Conference on Hybrid Information Technology.

[4] Taeho Jo. Neural Based Approach to Keyword Extraction from Documents , 2003, ICCSA.

[5] Carl Gutwin,et al. KEA: practical automatic keyphrase extraction , 1999, DL '99.

[6] Carl Gutwin,et al. Domain-Specific Keyphrase Extraction , 1999, IJCAI.

[7] Min-Yen Kan,et al. Keyphrase Extraction in Scientific Publications , 2007, ICADL.

[8] Hans Peter Luhn,et al. A Statistical Approach to Mechanized Encoding and Searching of Literary Information , 1957, IBM J. Res. Dev..

[9] Clement T. Yu,et al. A theory of term importance in automatic text analysis , 1974, J. Am. Soc. Inf. Sci..

[10] Jonathan D. Cohen. Highlights: language- and domain-independent automatic indexing terms for abstracting , 1995 .

[11] Xiaojun Wan,et al. CollabRank: Towards a Collaborative Approach to Single-Document Keyphrase Extraction , 2008, COLING.