From Episodes to Sagas: Understanding the News by Identifying Temporally Related Story Sequences

News interfaces are largely driven by recent information, even if many events are better interpreted in context of previous events. To address this problem, we consider the task of constructing an explicit representation of a “saga”—a longrunning series of related events. We define a timeline as a concrete representation of a “saga” and we propose two unsupervised methods for timeline construction and compare their performance to hand-produced timelines using a tree edit distance measure. Preliminary results using these techniques on a weblog corpus and a supplementary news corpus are presented, showing both promise and challenges.