With the vast amount of information arriving each day, it is necessary to develop automatic techniques for analyzing and handling these huge volumes of information. This problem is addressed by Topic Detection and Tracking (TDT), which organizes news stories by topics, and each topic is viewed as a flat collection of news stories. However, a topic in news is not only a flat collection of news stories but also a set of events. Additionally, there exists a three-layer hierarchy (topic → event → story), which can make people hold the new things that happen in the news easily. Therefore, to recognize the events in topics is significant. Unfortunately, the similarity between two stories, which belong to different events in a topic, is usually high. This is induced by common words occurring in both the two stories. And these common words usually cause events in the same topic to be mutually confusing. To address this problem, we present a novel approach for event identification in this paper. First, we need to remove topic-specific stopwords from each story, then some named-entities are selected as part of features due to their high distinguishable characteristic for identifying events. There is another issue deserving of in-depth consideration. We know weights on different features were empirically determined in the previous work. In our work, we propose a new method to calculate these weights. The experiments are implemented on a Linguistic Data Consortium dataset. The experimental results show that our scheme for event identification has significant improvement over the previous methods.
[1]
Juha Makkonen,et al.
Investigations on Event Evolution on TDT
,
2003,
NAACL.
[2]
James Allan,et al.
Using Names and Topics for New Event Detection
,
2005,
HLT/EMNLP.
[3]
Yiming Yang,et al.
A study of retrospective and on-line event detection
,
1998,
SIGIR '98.
[4]
Bin Wang,et al.
A probabilistic model for retrospective news event detection
,
2005,
SIGIR '05.
[5]
Yiming Yang,et al.
Learning approaches for detecting and tracking news events
,
1999,
IEEE Intell. Syst..
[6]
James Allan,et al.
Topic detection and tracking: event-based information organization
,
2002
.
[7]
Gang Wu,et al.
Term Committee Based Event Identification within News Topics
,
2008,
PAKDD.
[8]
James Allan,et al.
Text classification and named entities for new event detection
,
2004,
SIGIR '04.
[9]
Ramesh Nallapati,et al.
Event threading within news topics
,
2004,
CIKM '04.
[10]
Yiming Yang,et al.
Topic-conditioned novelty detection
,
2002,
KDD.