A Language Modeling Approach to Tracking News Events

This paper presents the TNO tracking system for the 2000 Topic Detection and Tracking evaluation project (TDT2000). The objective of the TDT tracking task is to track events of interest over time. Being a first year participant to the TDT project, our original goal for this year was to build a baseline tracking system based on a language modeling approach. This approach had proved to be powerful for the TREC adaptive filtering task and several other IR tasks. Whereas the focus of adapative filtering lies on individual threshold adaptation based on feedback, tracking requires a uniform decision threshold. The assumption that test document-topic scores are approximately normally distributed turned out to be a key ingredient. The results on the tracking task of TDT2000 surpassed our expectations: our baseline system was ranked second for both the basic required and the alternate challenge conditions.