A ranking approach to keyphrase extraction

This paper addresses the issue of automatically extracting keyphrases from a document. Previously, this problem was formalized as classification and learning methods for classification were utilized. This paper points out that it is more essential to cast the problem as ranking and employ a learning to rank method to perform the task. Specifically, it employs Ranking SVM, a state-of-art method of learning to rank, in keyphrase extraction. Experimental results on three datasets show that Ranking SVM significantly outperforms the baseline methods of SVM and Naive Bayes, indicating that it is better to exploit learning to rank techniques in keyphrase extraction.

[1]  Peter D. Turney Mining the Web for Lexical Knowledge to Improve Keyphrase Extraction: Learning from Labeled and Unlabeled Data , 2002, ArXiv.

[2]  B. Magnini,et al.  A Keyphrase-Based Approach to Summarization : the LAKE System at DUC-2005 , 2005 .

[3]  Tie-Yan Liu,et al.  Adapting ranking SVM to document retrieval , 2006, SIGIR.

[4]  Jing-Song Hu,et al.  Automatic keyphrases extraction from document using backpropagation , 2005, 2005 International Conference on Machine Learning and Cybernetics.

[5]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[6]  J. Fleiss Measuring nominal scale agreement among many raters. , 1971 .

[7]  Ken Barker,et al.  Using Noun Phrase Heads to Extract Document Keyphrases , 2000, Canadian Conference on AI.

[8]  Aiko M. Hormann,et al.  Programs for Machine Learning. Part I , 1962, Inf. Control..

[9]  Min Song,et al.  KPSpotter: a flexible information gain-based keyphrase extraction system , 2003, WIDM '03.

[10]  Tie-Yan Liu,et al.  Learning to rank: from pairwise approach to listwise approach , 2007, ICML '07.

[11]  Rada Mihalcea,et al.  TextRank: Bringing Order into Text , 2004, EMNLP.

[12]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[13]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[14]  Peter D. Turney Learning to Extract Keyphrases from Text , 2002, ArXiv.

[15]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[16]  Yi-fang Brook Wu,et al.  Finding nuggets in documents: A machine learning approach , 2006 .

[17]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[18]  Changning Huang,et al.  A Unified Statistical Model for the Identification of English BaseNP , 2000, ACL.

[19]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[20]  Carl Gutwin,et al.  KEA: practical automatic keyphrase extraction , 1999, DL '99.

[21]  Carl Gutwin,et al.  Domain-Specific Keyphrase Extraction , 1999, IJCAI.

[22]  Hongyuan Zha,et al.  Generic summarization and keyphrase extraction using mutual reinforcement principle and sentence clustering , 2002, SIGIR '02.

[23]  Euripides G. M. Petrakis,et al.  Automatic document indexing in large medical collections , 2006, HIKM '06.

[24]  Gregory N. Hullender,et al.  Learning to rank using gradient descent , 2005, ICML.

[25]  Joongmin Choi,et al.  Web Document Clustering by Using Automatic Keyphrase Extraction , 2007, 2007 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Workshops.

[26]  Thore Graepel,et al.  Large Margin Rank Boundaries for Ordinal Regression , 2000 .

[27]  Yaakov HaCohen-Kerner,et al.  Automatic Extraction and Learning of Keyphrases from Scientific Articles , 2005, CICLing.

[28]  Carl Gutwin,et al.  Improving browsing in digital libraries with keyphrase indexes , 1999, Decis. Support Syst..

[29]  Anette Hulth Combining Machine Learning and Natural Language Processing for Automatic Keyword Extraction , 2004 .

[30]  Peter D. Turney Learning Algorithms for Keyphrase Extraction , 2000, Information Retrieval.

[31]  Hang Li,et al.  AdaRank: a boosting algorithm for information retrieval , 2007, SIGIR.

[32]  Yoram Singer,et al.  An Efficient Boosting Algorithm for Combining Preferences by , 2013 .

[33]  Gordon W. Paynter,et al.  Automatic extraction of document keyphrases for use in digital libraries: Evaluation and applications , 2002, J. Assoc. Inf. Sci. Technol..

[34]  Anette Hulth,et al.  Improved Automatic Keyword Extraction Given More Linguistic Knowledge , 2003, EMNLP.

[35]  Li Zhang,et al.  Focused named entity recognition using machine learning , 2004, SIGIR '04.

[36]  Mark S. Staveley,et al.  Phrasier: a system for interactive document retrieval using keyphrases , 1999, SIGIR '99.