Keyphrase Extraction in Scientific Articles: A Supervised Approach

This paper contains the detailed approach of automatic extraction of Keyphrases from scientific articles (i.e. research paper) using supervised tool like Conditional Random Fields (CRF). Keyphrase is a word or set of words that describe the close relationship of content and context in the document. Keyphrases are sometimes topics of the document that represent the key ideas of the document. Automatic Keyphrase extraction is a very important module for the automatic systems like query or topic independent summarization, question-answering (QA), information retrieval (IR), document classification etc. The system was developed for the Task 5 of SemEval2. The system is trained using 144 scientific articles and tested on 100 scientific articles. Different combinations of features have been used. With combined keywords i.e. both authorassigned and reader-assigned keyword sets as answers, the system shows a precision of 32.34%, recall of 33.09% and F-measure of 32.71% with top 15 candidates.

[1]  Yi-fang Brook Wu,et al.  Domain-specific keyphrase extraction , 2005, CIKM '05.

[2]  Somnath Banerjee,et al.  A Hybrid Question Answering System based on Information Retrieval and Answer Validation , 2011, CLEF.

[3]  Sivaji Bandyopadhyay,et al.  A Query Focused Multi Document Automatic Summarization , 2010, PACLIC.

[4]  Min-Yen Kan,et al.  Re-examining Automatic Keyphrase Extraction Approaches in Scientific Articles , 2009, MWE@IJCNLP.

[5]  Somnath Banerjee,et al.  A Hybrid QA System with Focused IR and Automatic Summarization for INEX 2011 , 2011, INEX.

[6]  Sivaji Bandyopadhyay,et al.  Cross Lingual Query Dependent Snippet Generation , 2012 .

[7]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[8]  Chengzhi Zhang,et al.  Automatic Keyword Extraction from Documents Using Conditional Random Fields , 2008 .

[9]  Ken Barker,et al.  Using Noun Phrase Heads to Extract Document Keyphrases , 2000, Canadian Conference on AI.

[10]  Sivaji Bandyopadhyay,et al.  Theme Based English and Bengali Ad-hoc Monolingual Information Retrieval in FIRE 2010 , 2010 .

[11]  B. Magnini,et al.  A Keyphrase-Based Approach to Summarization : the LAKE System at DUC-2005 , 2005 .

[12]  Sivaji Bandyopadhyay,et al.  Language Independent Query Focused Snippet Generation , 2012, CLEF.

[13]  Bidhan Chandra Pal,et al.  Answer Extraction of Comparative and Evaluative Question in Tourism Domain , 2012 .

[14]  Peter D. Turney Learning to Extract Keyphrases from Text , 2002, ArXiv.