UMass at TDT 2000

We spent a fair amount of time this year rewriting our TDT system in order to provide more flexibility and to better integrate the various components. The time spent rearchitecting the code, learning to deal with its peculiarities, and correct bugs detracted substantially from research this year. As a result, the major approaches used on this evaluation are very similar to those used in TDT 1999. We had two thrusts to our research, neither of which was ready to be deployed in this evaluation. We report here on the results from the training data, in all cases explored within the link detection task. In the first direction, we looked more carefully at score normalization across different languages and media types. We found that we could improve results noticeably though not substantially by normalizing scores differently depending upon the source language. In the second direction, we considered smoothing the vocabulary in stories using a “query expansion” technique from Information Retrieval to add additional words from the corpus to each story. This resulted in substantial improvements.