Story tracking: linking similar news over time and across languages

The Europe Media Monitor system (EMM) gathers and aggregates an average of 50,000 newspaper articles per day in over 40 languages. To manage the information overflow, it was decided to group similar articles per day and per language into clusters and to link daily clusters over time into stories. A story automatically comes into existence when related groups of articles occur within a 7-day window. While cross-lingual links across 19 languages for individual news clusters have been displayed since 2004 as part of a freely accessible online application (http://press.jrc.it/NewsExplorer), the newest development is work on linking entire stories across languages. The evaluation of the monolingual aggregation of historical clusters into stories and of the linking of stories across languages yielded mostly satisfying results.