A Supervised KeyPhrase Extraction System

In this paper, we present a multi-featured supervised automatic keyword extraction system. We extracted salient semantic features which are descriptive of candidate keyphrases, a Random Forest classifier was used for training. The system achieved an accuracy of 58.3 % precision and has shown to outperform two top performing systems when benchmarked on a crowdsourced dataset. Furthermore, our approach achieved a personal best Precision and F-measure score of 32.7 and 25.5 respectively on the Semeval Keyphrase extraction challenge dataset. The paper describes the approaches used as well as the result obtained.

[1]  Marti A. Hearst Text Tiling: Segmenting Text into Multi-paragraph Subtopic Passages , 1997, CL.

[2]  Jaime G. Carbonell,et al.  Supervised Topical Key Phrase Extraction of News Stories using Crowdsourcing, Light Filtering and Co-reference Normalization , 2012, LREC.

[3]  Luigi Di Caro,et al.  Personalized emerging topic detection based on a term aging model , 2013, ACM Trans. Intell. Syst. Technol..

[4]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[5]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[6]  Jeffrey Heer,et al.  Replication of the Keyword Extraction part of the paper "'Without the Clutter of Unimportant Words': Descriptive Keyphrases for Text Visualization" , 2019, ArXiv.

[7]  Anette Hulth,et al.  Improved Automatic Keyword Extraction Given More Linguistic Knowledge , 2003, EMNLP.

[8]  Nick Cramer,et al.  Automatic Keyword Extraction from Individual Documents , 2010 .

[9]  Luigi Di Caro,et al.  Navigating within news collections using tag-flakes , 2011, J. Vis. Lang. Comput..

[10]  Min-Yen Kan,et al.  Keyphrase Extraction in Scientific Publications , 2007, ICADL.

[11]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[12]  Olena Medelyan,et al.  Human-competitive automatic topic indexing , 2009 .

[13]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[14]  Timothy Baldwin,et al.  SemEval-2010 Task 5 : Automatic Keyphrase Extraction from Scientific Articles , 2010, *SEMEVAL.

[15]  Mitsuru Ishizuka,et al.  Keyword extraction from a single document using word co-occurrence statistical information , 2004, Int. J. Artif. Intell. Tools.

[16]  Ian H. Witten,et al.  Subject metadata support powered by Maui , 2010, JCDL '10.

[17]  Patrick Pantel,et al.  From Frequency to Meaning: Vector Space Models of Semantics , 2010, J. Artif. Intell. Res..

[18]  Livio Robaldo,et al.  Learning from syntax generalizations for automatic semantic annotation , 2014, Journal of Intelligent Information Systems.

[19]  Carl Gutwin,et al.  KEA: practical automatic keyphrase extraction , 1999, DL '99.

[20]  Michael J. Giarlo A Comparative Analysis of Keyword Extraction Techniques , 2006 .

[21]  Zhiyuan Liu,et al.  Automatic Keyphrase Extraction via Topic Decomposition , 2010, EMNLP.

[22]  Peter D. Turney Learning to Extract Keyphrases from Text , 2002, ArXiv.

[23]  Peter D. Turney Learning Algorithms for Keyphrase Extraction , 2000, Information Retrieval.

[24]  Anette Hulth,et al.  A Study on Automatically Extracted Keywords in Text Categorization , 2006, ACL.

[25]  Lonneke van der Plas,et al.  Automatic Keyword Extraction from Spoken Text. A Comparison of Two Lexical Resources: EDR and WordNet , 2004, LREC.

[26]  Rada Mihalcea,et al.  TextRank: Bringing Order into Text , 2004, EMNLP.

[27]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[28]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[29]  Vincent Ng,et al.  Automatic Keyphrase Extraction: A Survey of the State of the Art , 2014, ACL.