Multiple Annotations of Reusable Data Resources: Corpora for Topic Detection and Tracking
暂无分享,去创建一个
Responding to demands for very large, easily accessible, reusable news corpora to support research in the topic detection and tracking paradigm, the Linguistic Data Consortium created the TDT corpora. In addition to supporting research in the Topic Detection and Tracking program, the TDT corpora were collected and annotated with an eye toward reuse and re-annotation. Their value is confirmed in the number of projects that have benefited from part of all of the TDT corpora for new uses. The paragraphs that follow will describe the raw data and annotations in the TDT corpora and summarize their use in multiple common-task research programs.
[1] Mark Liberman,et al. A formal framework for linguistic annotation , 1999, Speech Commun..
[2] Nancy Priest-Dorman Greg Ide,et al. Corpus Encoding Standard (CES) , 2000 .