Results of the 1999 topic detection and tracking evaluation in Mandarin and English

The National Institute of Standards and Technology (NIST) administered the second open evaluation of Topic Detection and Tracking (TDT) technologies in 1999. The TDT project supports development of technologies that automatically organize event-related news stories. The program leverages expertise in core technologies, Automatic Speech Recognition (ASR), Document Retrieval (DR), and Machine Translation (MT) to build the TDT technologies. The 1999 TDT project extended the 1998 TDT project in two dimensions, first by adding Mandarin Chinese audio and text sources and second by adding two new evaluation tasks. Through experimental controls and conditioned analysis of system performance, the 1999 evaluation yielded numerous insights into the effects of multilingual texts on TDT technologies. Three notable generalizations arise from the evaluation: (1) English and Mandarin story segmentation performance is similar, (2) cross-lingual topic tracking performance is 44% worse than monolingual tracking, and (3) multilingual topic detection performance is 37% worse than monolingual topic detection.