A Keyphrase-Based Approach to Summarization : the LAKE System at DUC-2005

The paper reports on LAKE participation at DUC-2005. We propose to exploit a keyphrase extraction methodology in order to identify relevant terms in the document. Afterward, a score mechanism is used to score the best sentences for each cluster of documents. At its heart, the LAKE algorithm first considers a number of linguistic features to extract a list of well motivated candidate keyphrases, then uses a machine learning framework to select significant keyphrases for a document. With respect to other approaches to keyphrase extraction, LAKE makes use of linguistic processors such as named entities recognition, which are not usually exploited. We discuss results and comment on both human assessment (Linguistic Quality and Responsiveness of the summaries), the ROUGE based evaluation, and the Pyramid evaluation.