论文信息 - Using Typed Dependencies to Study and Recognise Conceptualisation Zones in Biomedical Literature

Using Typed Dependencies to Study and Recognise Conceptualisation Zones in Biomedical Literature

In the biomedical domain, authors publish their experiments and findings using a quasi-standard coarse-grained discourse structure, which starts with an introduction that sets up the motivation, continues with a description of the materials and methods, and concludes with results and discussions. Over the course of the years, there has been a fair amount of research done in the area of scientific discourse analysis, with a focus on performing automatic recognition of scientific artefacts/conceptualisation zones from the raw content of scientific publications. Most of the existing approaches use Machine Learning techniques to perform classification based on features that rely on the shallow structure of the sentence tokens, or sentences as a whole, in addition to corpus-driven statistics. In this article, we investigate the role carried by the deep (dependency) structure of the sentences in describing their rhetorical nature. Using association rule mining techniques, we study the presence of dependency structure patterns in the context of a given rhetorical type, the use of these patterns in exploring differences in structure between the rhetorical types, and their ability to discriminate between the different rhetorical types. Our final goal is to provide a series of insights that can be used to complement existing classification approaches. Experimental results show that, in particular in the context of a fine-grained multi-class classification context, the association rules emerged from the dependency structure are not able to produce uniform classification results. However, they can be used to derive discriminative pair-wise classification mechanisms, in particular for some of the most ambiguous types.

Tudor Groza | T. Groza

[1] Joakim Nivre,et al. Dependency Parsing , 2009, Lang. Linguistics Compass.

[2] Alan Ruttenberg,et al. The SWAN biomedical discourse ontology , 2008, J. Biomed. Informatics.

[3] Noah A. Smith,et al. Dependency Parsing , 2009, Encyclopedia of Artificial Intelligence.

[4] William C. Mann,et al. RHETORICAL STRUCTURE THEORY: A THEORY OF TEXT ORGANIZATION , 1987 .

[5] György Szarvas,et al. Hedge Classification in Biomedical Texts with a Weakly Supervised Selection of Keywords , 2008, ACL.

[6] Simone Teufel. Towards Discipline-Independent Argumentative Zoning : Evidence from Chemistry and Computational Linguistics , 2009 .

[7] Simon Buckingham Shum,et al. Modelling discourse in contested domains: A semiotic and cognitive framework , 2006, Int. J. Hum. Comput. Stud..

[8] Aaron N. Kaplan,et al. Discovering Paradigm Shift Patterns in Biomedical Abstracts: Application to Neurodegenerative Diseases , 2005 .

[9] Christopher D. Manning,et al. Stanford typed dependencies manual , 2010 .

[10] Andrew McCallum,et al. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[11] Siegfried Handschuh,et al. SALT - Semantically Annotated LaTeX for scientific publications , 2007 .