A Tool for Semi-Automatic and Interactive Annotation of Dialogue Utterances with Information States

The availability of linguistically annotated corpora like the Penn Treebank has long proven beneficial for computational linguistics research. However, attempts have begun only recently to provide corpora withdiscourseinformation (c.f. e.g. URML (Reitter and Stede, 2003) or the Penn Discourse Treebank ( PDTB) project at the University of Pennsylvania 1). Here we report on a project whose goal is to generate a corpus annotated with deep discourse semantic information which can then be used to train statistical models of semantic interpretation. We introduce our general methodology and describe how it is instantiated in a purpose-built tool that supports interactive, semiautomated annotation. The novel feature of this tool is its use of a reasoning engine which implements a semantic theory of discourse interpretation to suggest annotations to the user. 2