Update Summarization

Update Summarization is a form of multi-document summarization wherein we generate a summary of a multi-document dataset based on the assumption that the user has already read a given set of documents. In our paper, we present a summarization system which clusters together sentences from the old set based on a semantic similarity score. We then use the centroids of these clusters, along with an information content score, to identify fresh or changed sentences in the subsequent set. These relevant sentences are ordered by their position in the original document and limited to 100 words to generate our update summaries. We discuss the components of our system, algorithms used and the motivation for choosing them. We also analyze the results at every stage and present our future work. We used our system to generate update summaries for dataset available as part of the DUC 2007 Update Summarization task. ROUGE scores obtained for the summaries generated are comparable to other participants.