Classification of Citation Sentence for Filtering Scientific References

Citation sentence is able to inform readers about relation between scientific articles that cite and are cited by finding its purpose against the research. Besides giving credit to other researchers and recommendation to read other related articles, citation can help readers to know what knowledge they have obtained based on the cited scientific articles they have read. In this research, we try to define citation categories for filtering scientific references which will be initial step in guided summarization of scientific articles. Our goal is to classify citation sentence first into ‘problem’, ‘other’, ‘useModel’, ‘useTool’ and ‘useData’. This category will make it easier to classify scientific articles into more specific topics. Then we use features namely voice, tenses, citation location, meta-discourse and bag of words. Then, we employ SVM Linear for building classification model and sampling technique, namely SMOTE for imbalance dataset. The best result of f-measure for our citation classification is achieved at 61.2% when combining voice& tense, meta-discourse, bag of words and sampling the feature data of UseData category with SMOTE.

[1]  Thorsten Joachims,et al.  Citation Classification And Its Applications , 2005 .

[2]  Chew Lim Tan,et al.  SWING: Exploiting Category-Specific Information for Guided Summarization , 2011, TAC.

[3]  Jennifer Peat,et al.  Scientific Writing Easy when you know how: Peat/Scientific Writing Easy when you know how , 2002 .

[4]  Simone Teufel,et al.  An annotation scheme for citation function , 2009, SIGDIAL Workshop.

[5]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[6]  Christopher D. Manning,et al.  Generating Typed Dependency Parses from Phrase Structure Parses , 2006, LREC.

[7]  Noriko Kando,et al.  Classification of research papers using citation links and citation types: Towards automatic review article generation. , 2011 .

[8]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[9]  Masayu Leylia Khodra,et al.  Survey on research paper's relations , 2015, 2015 International Conference on Information Technology Systems and Innovation (ICITSI).

[10]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[11]  Simone Teufel,et al.  Argumentative zoning information extraction from scientific text , 1999 .

[12]  Dwi H. Widyantoro,et al.  Citation sentence identification and classification for related work summarization , 2014, 2014 International Conference on Advanced Computer Science and Information System.

[13]  Dianne P. O'Leary,et al.  CLASSY 2011 at TAC: Guided and Multi-lingual Summaries and Evaluation Metrics , 2011, TAC.

[14]  Linda K. Shamoon,et al.  The Aims and Process of the Research Paper. , 1982 .

[15]  Simone Teufel,et al.  Automatic classification of citation function , 2006, EMNLP.

[16]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[17]  Daniel Jurafsky,et al.  Measuring the Evolution of a Scientific Field through Citation Frames , 2018, TACL.