On the Discoursive Structure of Computer Graphics Research Papers

Understanding the structure of scientific discourse is of paramount importance for the development of appropriate Natural Language Processing tools able to extract and summarize information from research articles. In this paper we present an annotated corpus of scientific discourse in the domain of Computer Graphics. We describe the way we built our corpus by designing an annotation schema and relying on three annotators for manually classifying all sentences into the defined categories. Our corpus constitutes a semantically rich resource for scientific text mining. In this respect, we also present the results of our initial experiments of automatic classification of sentences into the 5 main categories in our corpus.

[1]  James R. Curran,et al.  Accurate Argumentative Zoning with Maximum Entropy models , 2009 .

[2]  E. Garfield,et al.  Can Citation Indexing Be Automated ? , 1964 .

[3]  Paul Buitelaar,et al.  Identifying the Epistemic Value of Discourse Segments in Biology Texts (project abstract) , 2009, IWCS.

[4]  Simone Teufel,et al.  Towards Domain-Independent Argumentative Zoning: Evidence from Chemistry and Computational Linguistics , 2009, EMNLP.

[5]  Dietrich Rebholz-Schuhmann,et al.  Using argumentation to extract key sentences from biomedical abstracts , 2007, Int. J. Medical Informatics.

[6]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[7]  Windy Dryden,et al.  Identifying the A , 2013 .

[8]  Mary Elizabeth Stevens,et al.  Statistical Association Methods for Mechanized Documentation. , 1967 .

[9]  Johan A. K. Suykens,et al.  Least Squares Support Vector Machine Classifiers , 1999, Neural Processing Letters.

[10]  Kristin L. Sainani,et al.  Logistic Regression , 2014, PM & R : the journal of injury, function, and rehabilitation.

[11]  Claire Grover,et al.  The HOLJ Corpus. Supporting Summarisation of Legal Texts , 2004 .

[12]  Marc Moens,et al.  Articles Summarizing Scientific Articles: Experiments with Relevance and Rhetorical Status , 2002, CL.

[13]  Nigel Collier,et al.  An Annotation Scheme for a Rhetorical Analysis of Biology Articles , 2004, LREC.

[14]  Maria Liakata,et al.  Identifying the Information Structure of Scientific Abstracts: An Investigation of Three Different Schemes , 2010, BioNLP@ACL.

[15]  Sophia Ananiadou,et al.  Construction of an annotated corpus to support biomedical information extraction , 2009, BMC Bioinformatics.

[16]  Simone Teufel,et al.  Argumentative zoning information extraction from scientific text , 1999 .

[17]  Simone Teufel Towards Discipline-Independent Argumentative Zoning : Evidence from Chemistry and Computational Linguistics , 2009 .

[18]  Bernd Bohnet,et al.  Very high accuracy and fast dependency parsing is not a contradiction , 2010, COLING 2010.

[19]  Simone Teufel,et al.  Corpora for the Conceptualisation and Zoning of Scientific Papers , 2010, LREC.

[20]  Anna Korhonen,et al.  Active learning-based information structure analysis of full scientific articles and two applications for biomedical literature review , 2013, Bioinform..

[21]  Simone Teufel,et al.  The Structure of Scientific Articles - Applications to Citation Indexing and Summarization , 2010, CSLI Studies in Computational Linguistics.

[22]  Naoaki Okazaki,et al.  Identifying Sections in Scientific Abstracts using Conditional Random Fields , 2008, IJCNLP.

[23]  I. Spiegel-Rosing Science Studies: Bibliometric and Content Analysis , 1977 .

[24]  Hagit Shatkay,et al.  New directions in biomedical text annotation: definitions, guidelines and corpus construction , 2006, BMC Bioinformatics.

[25]  K. Bretonnel Cohen,et al.  Hypothesis and Evidence Extraction from Full-Text Scientific Journal Articles , 2011, BioNLP@ACL.

[26]  Jimmy J. Lin,et al.  Generative Content Models for Structural Analysis of Medical Abstracts , 2006, BioNLP@NAACL-HLT.

[27]  Ross D King,et al.  An ontology of scientific experiments , 2006, Journal of The Royal Society Interface.

[28]  Hagit Shatkay,et al.  Multi-dimensional classification of biomedical text: Toward automated, practical provision of high-utility text to diverse users , 2008, Bioinform..

[29]  J. Hilbe Logistic Regression Models , 2009 .

[30]  Dietrich Rebholz-Schuhmann,et al.  Automatic recognition of conceptualization zones in scientific articles and two life science applications , 2012, Bioinform..

[31]  Sophia Ananiadou,et al.  Meta-Knowledge Annotation of Bio-Events , 2010, LREC.

[32]  Nigel Collier,et al.  Zone analysis in biology articles as a basis for information extraction , 2006, Int. J. Medical Informatics.

[33]  Alan Ruttenberg,et al.  The SWAN biomedical discourse ontology , 2008, J. Biomed. Informatics.

[34]  Maria Liakata,et al.  Guidelines for the annotation of General Scientific Concepts (GSC) , 2008 .