PropBank: the Next Level of TreeBank

There has long been a recognition that syntactic structure alone does not provide enough information for machine understanding of human language. Various efforts under the auspices of MUC [8] have added limited-coverage semantic lexicons in order to improve the performance of the systems under evaluation. With the aim of providing data for statistical techniques several sites are investigating semantic annotation. The Prague Tectogrammatics project [3] endeavours to annotate semantic relationships at the same time as syntactic and morphological structure. The Framenet Project [4] is eschewing fine-grained syntactic structure in favor of ’chunked’ data and semantic annotation. This paper describes the PropBank project at Penn, which adds a layer of semantic annotation atop the syntactic structure already present in the Penn TreeBank [5,6].