CollabRank: Towards a Collaborative Approach to Single-Document Keyphrase Extraction

Previous methods usually conduct the keyphrase extraction task for single documents separately without interactions for each document, under the assumption that the documents are considered independent of each other. This paper proposes a novel approach named CollabRank to collaborative single-document keyphrase extraction by making use of mutual influences of multiple documents within a cluster context. CollabRank is implemented by first employing the clustering algorithm to obtain appropriate document clusters, and then using the graph-based ranking algorithm for collaborative single-document keyphrase extraction within each cluster. Experimental results demonstrate the encouraging performance of the proposed approach. Different clustering algorithms have been investigated and we find that the system performance relies positively on the quality of document clusters.

[1]  Min Song,et al.  KPSpotter: a flexible information gain-based keyphrase extraction system , 2003, WIDM '03.

[2]  Mo Chen,et al.  A practical system of keyphrase extraction for web pages , 2005, CIKM '05.

[3]  Xiaojun Wan,et al.  Towards an Iterative Reinforcement Approach for Simultaneous Document Summarization and Keyword Extraction , 2007, ACL.

[4]  Andreas Paepcke,et al.  Seeing the whole in parts: text summarization for web browsing on handheld devices , 2001, WWW '01.

[5]  Evangelos E. Milios,et al.  Narrative text classification for automatic key phrase extraction in web document corpora , 2005, WIDM '05.

[6]  Ken Barker,et al.  Using Noun Phrase Heads to Extract Document Keyphrases , 2000, Canadian Conference on AI.

[7]  Bruce Krulwich,et al.  Learning user information interests through extraction of semantically significant phrases , 1996 .

[8]  Christopher D. Manning,et al.  Enriching the Knowledge Sources Used in a Maximum Entropy Part-of-Speech Tagger , 2000, EMNLP.

[9]  Rada Mihalcea,et al.  TextRank: Bringing Order into Text , 2004, EMNLP.

[10]  Ian H. Witten,et al.  Thesaurus based automatic keyphrase indexing , 2006, Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '06).

[11]  Richard K. Belew,et al.  Exporting phrases: a statistical analysis of topical language , 1991 .

[12]  Alberto Muñoz,et al.  Compound Key Word Generation from Document Databases Using A Hierarchical Clustering ART Model , 1997, Intell. Data Anal..

[13]  Paul Over,et al.  Intrinsic Evaluation of Generic News Text Summarization Systems , 2003 .

[14]  Yi-fang Brook Wu,et al.  Domain-specific keyphrase extraction , 2005, CIKM '05.

[15]  Peter D. Turney Coherent Keyphrase Extraction via Web Mining , 2003, IJCAI.

[16]  Peter D. Turney Learning Algorithms for Keyphrase Extraction , 2000, Information Retrieval.

[17]  Min-Yen Kan,et al.  Keyphrase Extraction in Scientific Publications , 2007, ICADL.

[18]  Alberto Muòoz,et al.  Compound Key Word Generation from Document Databases Using A Hierarchical Clustering ART Model , 1997 .

[19]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[20]  Joshua Goodman,et al.  Finding advertising keywords on web pages , 2006, WWW '06.

[21]  Saturnino Luz,et al.  Automatic Hypertext Keyphrase Detection , 2005, IJCAI.

[22]  Vibhu O. Mittal,et al.  OCELOT: a system for summarizing Web pages , 2000, SIGIR '00.

[23]  Anette Hulth,et al.  Improved Automatic Keyword Extraction Given More Linguistic Knowledge , 2003, EMNLP.

[24]  Matthew Hurst,et al.  A Language Model Approach to Keyphrase Extraction , 2003, ACL 2003.

[25]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[26]  Carl Gutwin,et al.  KEA: practical automatic keyphrase extraction , 1999, DL '99.

[27]  Hongyuan Zha,et al.  Generic summarization and keyphrase extraction using mutual reinforcement principle and sentence clustering , 2002, SIGIR '02.

[28]  Mohamed S. Kamel,et al.  CorePhrase: Keyphrase Extraction for Document Clustering , 2005, MLDM.

[29]  Carl Gutwin,et al.  Improving browsing in digital libraries with keyphrase indexes , 1999, Decis. Support Syst..

[30]  Evangelos E. Milios,et al.  Term-Based Clustering and Summarization of Web Page Collections , 2004, Canadian Conference on AI.