Applying Semantic Classes in Event Detection and Tracking

Event detection and tracking is a somewhat recent area of information retrieval research. The detection is about spotting new, previously unreported real-life events from online news-feed, while the tracking assigns documents to previously spotted events. We propose a new vector model consisting of four semantic classes from the documents: locations, proper names, temporal expressions and normal terms that are stored in designated subvectors. We also propose a new similarity measure based on utilizing semantic classes. Moreover, due to the vagueness of the concept of event, we run our experiments with several different definitions. In our experiments on a Finnish online news-stream corpus, we find that the use of semantic classes improves the performance significantly. Furthermore, the granularity by which the events are labeled influences the efficiency of the TDT tasks.