Term Committee Based Event Identification and Dependency Discovery

With the overwhelming volume of news stories created and stored electronically everyday, there is an increasing need for techniques to analyze and present news stories to the users in a more meaningful manner. Most previous research focus on organizing news set into flat collections (topics) of stories. However, a topic in news is more than a mere collection of stories: it is actually characterized by a definite structure of inter-related events. Unfortunately, it is very difficult to identify events and dependencies within a topic because stories about the same topic are usually very similar to each other irrespective of the events they belong to. This is because stories within a topic usually share some terms which are related to the topic other than a specific event. To deal with this problem, we propose two methods based on event key terms to identify events and discover event dependency accurately. For event identification, we first capture some tight term clusters as term committees of potential events, and then use them to find the core story sets of potential events. At last we assign all stories to an event. For event dependency discovery, we emphasize the terms closely related to a certain event. So similarity contributed by topic-popular terms can be decreased. The experimental results on two Linguistic Data Consortium (LDC) datasets show that both the proposed methods for event identification and event dependency discovery have significant improvement over previous methods.