论文信息 - The PYTHY Summarization System: Microsoft Research at DUC 2007

The PYTHY Summarization System: Microsoft Research at DUC 2007

PYTHY is a trainable extractive summarization engine that learns a log-linear sentence ranking model by maximizing three metrics of sentence goodness: two of the metrics are based on ROUGE scores against model summaries and one is based on Semantic Content Unit (SCU) weights associated with sentences selected by past peers that were obtained during the Pyramid evaluations. In addition to sentences from the document set, our system considers simplified sentences for inclusion in the generated summaries. The feature weights of the model are optimized on the DUC 2005 data, with the final feature set for the submitted system being determined by ROUGE-2 scores against the DUC 2006 model summaries. For the DUC update task, the model was augmented with a novelty detection classifier.

[1] Francine Chen,et al. A trainable document summarizer , 1995, SIGIR '95.

[2] Donna K. Harman,et al. Overview of the TREC 2003 Novelty Track , 2003, TREC.

[3] Yoram Singer,et al. Log-Linear Models for Label Ranking , 2003, NIPS.

[4] Eric K. Ringger,et al. Using the Penn Treebank to Evaluate Non-Treebank Parsers , 2004, LREC.

[5] Chin-Yew Lin,et al. ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[6] Ellen M. Voorhees,et al. Overview of TREC 2004 , 2004, TREC.

[7] Gregory N. Hullender,et al. Learning to rank using gradient descent , 2005, ICML.

[8] Kathleen R. McKeown,et al. Applying the Pyramid Method in DUC 2005 , 2005 .

[9] Vivi Nastase,et al. Leveraging DUC , 2006 .

[10] Ani Nenkova,et al. A compositional context sensitive multi-document summarizer: exploring the factors that influence summarization , 2006, SIGIR.

[11] Brian Roark,et al. Query-focused summarization by supervised sentence ranking and skewed word distributions , 2006 .

[12] Lucy Vanderwende,et al. Microsoft Research at DUC2006: Task-Focused Summarization with Sentence Simplification and Lexical Expansion , 2006 .

[13] Ryan T. McDonald. A Study of Global Inference Algorithms in Multi-document Summarization , 2007, ECIR.

[14] Joshua Goodman,et al. Multi-Document Summarization by Maximizing Informative Content-Words , 2007, IJCAI.