Temporal-Semantic Clustering of Newspaper Articles for Event Detection

In this paper we introduce a new clustering algorithm for event detection in newspaper articles, which has two main features. Firstly, it makes use of the temporal references extracted from the document texts to define the document similarity function. Secondly, the algorithm works hierarchically. In the first level, documents with a high temporal-semantic similarity are grouped into individual events by applying the proposed similarity functions. In the next levels, these events are successively grouped so that more complex events and topics can be identified. The resulting hierarchy describes the structure of topics and events taking into account their temporal occurrence. These tasks cannot be currently accomplished by current Topic Detection and systems.