Multiple Annotations of Reusable Data Resources: Corpora for Topic Detection and Tracking

Responding to demands for very large, easily accessible, reusable news corpora to support research in the topic detection and tracking paradigm, the Linguistic Data Consortium created the TDT corpora. In addition to supporting research in the Topic Detection and Tracking program, the TDT corpora were collected and annotated with an eye toward reuse and re-annotation. Their value is confirmed in the number of projects that have benefited from part of all of the TDT corpora for new uses. The paragraphs that follow will describe the raw data and annotations in the TDT corpora and summarize their use in multiple common-task research programs.