Context identification of sentences in research articles: Towards developing intelligent tools for the research community

Scientific literature is an important medium for disseminating scientific knowledge. However, in recent times, a dramatic increase in research output has resulted in challenges for the research community. An increasing need is felt for tools that exploit the full content of an article and provide insightful services with value beyond quantitative measures such as impact factors and citation counts. However, the intricacies of language and thought, and the unstructured format of research articles present challenges in providing such services. The identification of sentence contexts that encode the role of specific sentences in advancing an article’s scientific argument can facilitate in developing intelligent tools for the research community. This paper describes our research work in this direction. First, we investigate the possibility of identifying contexts associated with sentences and propose a scheme of thirteen context type definitions for sentences, based on the generic rhetorical pattern found in scientific articles. We then present the results of our experiments using sequential classifiers – conditional random fields – for achieving automatic context identification. We also describe our Semantic Web application developed for providing citation context based information services for the research community. Finally, we present a comparison and analysis of our results with similar studies and explain the distinct features of our application.

[1]  D. Lindsey,et al.  The outlook of journal editors and referees on the normative criteria of scientific craftsmanship , 1978 .

[2]  J. Swales CITATION ANALYSIS AND DISCOURSE ANALYSIS , 1986 .

[3]  S. Baldi Normative versus social constructivist processes in the allocation of citations : A network-analytic model , 1998 .

[4]  Radoslav Radoulov,et al.  Exploring Automatic Citation Classification , 2008 .

[5]  Simone Teufel,et al.  Automatic classification of citation function , 2006, EMNLP.

[6]  Judy Pearsall,et al.  Oxford Dictionary of English , 2010 .

[7]  D. Cases,et al.  How can we investigate citation behavior?: a study of reasons for citing literature in communication , 2000 .

[8]  Maria Liakata,et al.  Zones of conceptualisation in scientific papers: a window to negative and speculative statements , 2010, NeSp-NLP@ACL.

[9]  Jovanna Dahlgren,et al.  The first-year growth response to growth hormone treatment predicts the long-term prepubertal growth response in children , 2009, BMC Medical Informatics Decis. Mak..

[10]  David Martínez,et al.  Automatic classification of sentences to support Evidence Based Medicine , 2011, BMC Bioinformatics.

[11]  Simone Teufel,et al.  Corpora for the Conceptualisation and Zoning of Scientific Papers , 2010, LREC.

[12]  Terrence A. Brooks,et al.  Private acts and public objects: An investigation of citer motivations , 1985, J. Am. Soc. Inf. Sci..

[13]  Claire Grover,et al.  Sequence modelling for sentence classification in a legal summarisation system , 2005, SAC '05.

[14]  Naoaki Okazaki,et al.  Identifying Sections in Scientific Abstracts using Conditional Random Fields , 2008, IJCNLP.

[15]  H. D. White Citation Analysis and Discourse Analysis Revisited. , 2004 .

[16]  Nigel Collier,et al.  An Annotation Scheme for a Rhetorical Analysis of Biology Articles , 2004, LREC.

[17]  Ayoub Al-Hamadi,et al.  A Hidden Markov Model-based continuous gesture recognition system for hand motion trajectory , 2008, 2008 19th International Conference on Pattern Recognition.

[18]  Grace Yuet-Chee Chung,et al.  Sentence retrieval for abstracts of randomized controlled trials , 2009, BMC Medical Informatics Decis. Mak..

[19]  Marc Moens,et al.  Discourse-level argumentation in scientific articles: human and automatic annotation , 1999 .

[20]  Maria Liakata,et al.  An ontology methodology and CISP-the proposed Core Information about Scientific Papers , 2007 .

[21]  Petra Saskia Bayerl,et al.  Text Type Structure and Logical Document Structure , 2004, ACL 2004.

[22]  Andrew McCallum,et al.  Maximum Entropy Markov Models for Information Extraction and Segmentation , 2000, ICML.

[23]  Nigel Collier,et al.  Zone Identification in Biology Articles as a Basis for Information Extraction , 2004, NLPBA/BioNLP.

[24]  I. Spiegel-Rosing Science Studies: Bibliometric and Content Analysis , 1977 .

[25]  T. Brooks Evidence of complex citer motivations , 1986, J. Am. Soc. Inf. Sci..

[26]  Noriko Kando,et al.  Classification of research papers using citation links and citation types: Towards automatic review article generation. , 2011 .

[27]  Chandra G. Prabha,et al.  Some aspects of citation behavior: A pilot study in business administration , 1983, J. Am. Soc. Inf. Sci..

[28]  Jianying Hu,et al.  HMM Based On-Line Handwriting Recognition , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[29]  Charles Oppenheim,et al.  Highly cited old papers and the reasons why they continue to be cited , 1978, J. Am. Soc. Inf. Sci..

[30]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[31]  F. E. Principles of Economics , 1890, Nature.

[32]  Achim G. Hoffmann,et al.  A New Approach for Scientific Citation Classification Using Cue Phrases , 2003, Australian Conference on Artificial Intelligence.

[33]  M. Moravcsik,et al.  Some Results on the Function and Quality of Citations , 1975 .

[34]  A. Marshall Principles of Economics , .

[35]  Manabu Okumura,et al.  Towards Multi-paper Summarization Using Reference Information , 1999, IJCAI.

[36]  Donald Owen Case,et al.  How can we investigate citation behavior? A study of reasons for citing literature in communication , 2000, J. Am. Soc. Inf. Sci..

[37]  Maria Liakata,et al.  Identifying the Information Structure of Scientific Abstracts: An Investigation of Three Different Schemes , 2010, BioNLP@ACL.

[38]  Ben-Ami Lipetz,et al.  Improvement of the selectivity of citation indexes to science literature through inclusion of citation relationship indicators , 1965 .

[39]  Fernando Pereira,et al.  Shallow Parsing with Conditional Random Fields , 2003, NAACL.

[40]  Wen Huang,et al.  MTML-msBayes: Approximate Bayesian comparative phylogeographic inference from multiple taxa and multiple loci with rate heterogeneity , 2011, BMC Bioinformatics.

[41]  Julian M. Kupiec,et al.  Robust part-of-speech tagging using a hidden Markov model , 1992 .

[42]  Stephen Cranefield,et al.  Contextual information extraction in research articles: a case of developing contextual RDF data for ESWC papers , 2011, I-Semantics '11.

[43]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[44]  Donald O. Tanguay Hidden Markov models for gesture recognition , 1995 .

[45]  Daryl E. Chubin,et al.  Content Analysis of References: Adjunct or Alternative to Citation Counting? , 1975 .

[46]  Simone Teufel,et al.  Argumentative zoning information extraction from scientific text , 1999 .

[47]  Stephen Cranefield,et al.  Context identification of sentences in related work sections using a conditional random field: towards intelligent digital libraries , 2010, JCDL '10.

[48]  Hagit Shatkay,et al.  Multi-dimensional classification of biomedical text: Toward automated, practical provision of high-utility text to diverse users , 2008, Bioinform..

[49]  Stephen Cranefield,et al.  Contextual information retrieval in research articles: Semantic publishing tools for the research community , 2014, Semantic Web.

[50]  C. O. Frost The Use of Citations in Literary Research: A Preliminary Classification of Citation Functions , 1979, The Library Quarterly.

[51]  C. Lee Giles,et al.  Digital Libraries and Autonomous Citation Indexing , 1999, Computer.

[52]  Wang-Chien Lee,et al.  CiteSeerx: an architecture and web service design for an academic document search engine , 2006, WWW '06.

[53]  Yoshiteru Nakamori,et al.  Detecting Citation Types Using Finite-State Machines , 2006, PAKDD.

[54]  W. Shadish,et al.  Author Judgements about Works They Cite: Three Studies from Psychology Journals , 1995 .

[55]  Robert E. Mercer,et al.  Towards an Automated Citation Classifier , 2000, Canadian Conference on AI.

[56]  Victoria S. Uren,et al.  Modeling naturalistic argumentation in research literatures: Representation and interaction design issues , 2007, Int. J. Intell. Syst..

[57]  Hagit Shatkay,et al.  New directions in biomedical text annotation: definitions, guidelines and corpus construction , 2006, BMC Bioinformatics.

[58]  Andreas Wilke,et al.  Functional analysis of metagenomes and metatranscriptomes using SEED and KEGG , 2011, BMC Bioinformatics.