Reducing Over-generation Errors for Automatic Keyphrase Extraction using Integer Linear Programming

We introduce a global inference model for keyphrase extraction that reduces overgeneration errors by weighting sets of keyphrase candidates according to their component words. Our model can be applied on top of any supervised or unsupervised word weighting function. Experimental results show a substantial improvement over commonly used word-based ranking approaches.

[1]  Mark Last,et al.  Graph-Based Keyword Extraction for Single-Document Summarization , 2008, COLING 2008.

[2]  Minh-Thang Luong,et al.  WINGNUS: Keyphrase Extraction Utilizing Document Logical Structure , 2010, *SEMEVAL.

[3]  Xiaojun Wan,et al.  CollabRank: Towards a Collaborative Approach to Single-Document Keyphrase Extraction , 2008, COLING.

[4]  Xiaojun Wan,et al.  Single Document Keyphrase Extraction Using Neighborhood Knowledge , 2008, AAAI.

[5]  Carl Gutwin,et al.  KEA: practical automatic keyphrase extraction , 1999, DL '99.

[6]  Rada Mihalcea,et al.  TextRank: Bringing Order into Text , 2004, EMNLP.

[7]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[8]  Gábor Berend,et al.  Opinion Expression Mining by Exploiting Keyphrase Extraction , 2011, IJCNLP.

[9]  Vincent Ng,et al.  Automatic Keyphrase Extraction: A Survey of the State of the Art , 2014, ACL.

[10]  Noah A. Smith,et al.  Proceedings of EMNLP , 2007 .

[11]  Laurent Romary,et al.  HUMB: Automatic Key Term Extraction from Scientific Articles in GROBID , 2010, *SEMEVAL.

[12]  Zhiyuan Liu,et al.  Automatic Keyphrase Extraction by Bridging Vocabulary Gap , 2011, CoNLL.

[13]  Timo Honkela,et al.  Likey: Unsupervised Language-Independent Keyphrase Extraction , 2010, SemEval@ACL.

[14]  Matthew Hurst,et al.  A Language Model Approach to Keyphrase Extraction , 2003, ACL 2003.

[15]  Sebastian Riedel,et al.  Proceedings of the Workshop on Integer Linear Programming for Natural Langauge Processing , 2009 .

[16]  Timothy Baldwin,et al.  SemEval-2010 Task 5 : Automatic Keyphrase Extraction from Scientific Articles , 2010, *SEMEVAL.

[17]  Florian Boudin,et al.  TopicRank: Graph-Based Topic Ranking for Keyphrase Extraction , 2013, IJCNLP.

[18]  Benoit Favre,et al.  A Scalable Global Model for Summarization , 2009, ILP 2009.

[19]  Ahmed A. Rafea,et al.  KP-Miner: Participation in SemEval-2 , 2010, *SEMEVAL.

[20]  Vincent Ng,et al.  Conundrums in Unsupervised Keyphrase Extraction: Making Sense of the State-of-the-Art , 2010, COLING.

[21]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[22]  KimSu Nam,et al.  Automatic keyphrase extraction from scientific articles , 2013 .

[23]  Kai Hong,et al.  Improving the Estimation of Word Importance for News Multi-Document Summarization , 2014, EACL.

[24]  Florian Boudin,et al.  A Comparison of Centrality Measures for Graph-Based Keyphrase Extraction , 2013, IJCNLP.

[25]  Zhiyuan Liu,et al.  Clustering to Find Exemplar Terms for Keyphrase Extraction , 2009, EMNLP.

[26]  Peter D. Turney Learning Algorithms for Keyphrase Extraction , 2000, Information Retrieval.

[27]  Anette Hulth,et al.  A Study on Automatically Extracted Keywords in Text Categorization , 2006, ACL.

[28]  Zhiyuan Liu,et al.  Automatic Keyphrase Extraction via Topic Decomposition , 2010, EMNLP.

[29]  Samir Khuller,et al.  The Budgeted Maximum Coverage Problem , 1999, Inf. Process. Lett..

[30]  Xuanjing Huang,et al.  Keyphrase Extraction from Online News Using Binary Integer Programming , 2011, IJCNLP.

[31]  Dan Klein,et al.  Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network , 2003, NAACL.

[32]  Min-Yen Kan,et al.  Re-examining Automatic Keyphrase Extraction Approaches in Scientific Articles , 2009, MWE@IJCNLP.

[33]  Karen Spärck Jones A statistical interpretation of term specificity and its application in retrieval , 2021, J. Documentation.