We report on our ongoing investigation into the relationship between the linear order of text segments and the underlying argument structure in certain newspaper commentaries. After briefly introducing our corpus and the general layout of the research project, we describe our approach to representing argument structure as a “support graph”. Then we turn to the relation between this abstract structure and the linearization of the argument in the text; to this end, we suggest a mapping between the support graph and the text linearization, and offer some first observations on correlations. 1 Research framework and corpus Our research is embedded in the framework of multi-level annotation, an approach that does not aim at capturing discourse-structural phenomena in a single representation, but distributes information into several different conceptual realms and corresponding distinct technical annotation layers (see Stede 2007 and Stede 2008). Texts are analyzed on levels such as syntax, coreference, information structure, or conjunctive relations, and annotations are produced with dedicated software tools. The results of the individual annotations are stored in a database that allows for viewing the annotations, querying the data across annotation levels, and running statistical analyses to explore relationships between different levels (a step that we label “annotation mining”). The research we report here is a pilot study in which the authors carefully examined 11 texts and negotiated “gold standard” analyses of argument structure and also of rhetorical structure in line with (Mann & Thompson 1988). The experience gained in this negotiation process leads to the formulation of specific and detailed annotation guidelines. The “real” study will then involve independent annotators working solely on the basis of the guidelines. Interannotator agreement will be measured to check whether the task is manageable; if so, correlations between these and other annotations (on different levels) will be investigated systematically. The corpus we use is a collection of German newspaper commentaries (Stede 2004a). For the specific research reported in this paper, we focus on a particular sub-corpus with commentaries from the Pro & Contra section of Tagesspiegel am Sonntag. These short pieces (12 to 16 sentences) reply to a yes/no question currently under debate in Berlin politics; both a “pro” and a “contra” opinion are published next to each other, accompanied by an article giving background information. Thus in these texts we find very crisp argumentation: authors
[1]
Manfred Stede.
RST revisited : disentangling nuclearity
,
2008
.
[2]
Christian Chiarcos,et al.
5. Rhetorical distance revisited: A parameterized approach
,
2008
.
[3]
S. Toulmin.
The uses of argument
,
1960
.
[4]
William C. Mann,et al.
Rhetorical Structure Theory: Toward a functional theory of text organization
,
1988
.
[5]
Christian Chiarcos,et al.
A Flexible Framework for Integrating Annotations from Different Tools and Tagsets
,
2008
.
[6]
Ian H. Witten,et al.
Data mining - practical machine learning tools and techniques, Second Edition
,
2005,
The Morgan Kaufmann series in data management systems.
[7]
Alex Lascarides,et al.
Logics of Conversation
,
2005,
Studies in natural language processing.
[8]
Nancy Ide,et al.
Veins Theory: A Model of Global Discourse Cohesion and Coherence
,
1998,
ACL.
[9]
James B. Freeman,et al.
Dialectics and the Macrostructure of Arguments
,
1991
.
[10]
Chris Reed,et al.
Representing dialogic argumentation
,
2006,
Knowl. Based Syst..
[11]
Manfred Stede,et al.
Korpusgestützte Textanalyse : Grundzüge der Ebenen-orientierten Textlinguistik
,
2007
.
[12]
Manfred Stede,et al.
The Potsdam Commentary Corpus
,
2004,
ACL 2004.