Modeling Novelty and Feature Combination using Support Vector Regression for Update Summarization

Summarization is the process of condensing a piece of text while retaining important information. A well composed and coherent summary is the solution for information overload. Sentence extractive summarization system requires different features to rank sentences and then generate summaries. In this paper we provide a detailed analysis about effect of various features in context of update summarization. We adapt a machine learning algorithm for combining features while scoring a sentence. Further, we propose a new feature that can effectively capture novelty along with relevancy of a sentence in a topic. Evaluation results show that our summmarizer is able to surpass top performing systems participated at Text analysis conference 2008. Gap between oracle summaries and state of art summaries is analyzed to depict the scope of improvement in sentence extractive summarization.

[1]  Francine Chen,et al.  A trainable document summarizer , 1995, SIGIR '95.

[2]  Hua Li,et al.  Document Summarization Using Conditional Random Fields , 2007, IJCAI.

[3]  Brian Roark,et al.  Query-focused summarization by supervised sentence ranking and skewed word distributions , 2006 .

[4]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[5]  Chin-Yew Lin,et al.  Looking for a Few Good Metrics: Automatic Summarization Evaluation - How Many Samples Are Enough? , 2004, NTCIR.

[6]  Min-Yen Kan,et al.  NUS at TAC 2008: Augumenting Timestamped Graphs with Event Information and Selectively Expanding Opinion Contexts , 2008, TAC.

[7]  Vasudeva Varma,et al.  Query Independent Sentence Scoring approach to DUC 2006 , 2006 .

[8]  Jin Zhang,et al.  ICTCAS's ICTGrasper at TAC 2008: Summarizing Dynamic Information with Signature Terms Based Content Filtering , 2008, TAC.

[9]  John M. Conroy Left-Brain/Right-Brain Multi-Document Summarization , 2004 .

[10]  Ray R. Larson A logistic regression approach to distributed IR , 2002, SIGIR '02.

[11]  Vasudeva Varma,et al.  Sentence Position revisited: A robust light-weight Update Summarization ‘baseline’ Algorithm , 2009 .

[12]  Imre Csiszár,et al.  Information Theory and Statistics: A Tutorial , 2004, Found. Trends Commun. Inf. Theory.

[13]  Christof Monz,et al.  Automatic Single-Document Key Fact Extraction from Newswire Articles , 2009, EACL.

[14]  Sujian Li,et al.  Multi-document Summarization Using Support Vector Regression , 2007 .

[15]  Frank Schilder,et al.  FastSum: Fast and Accurate Query-based Multi-document Summarization , 2008, ACL.

[16]  Ting Liu,et al.  HITIR's Update Summary at TAC 2008: Extractive Content Selection for Language Independence , 2008, TAC.

[17]  R. Brereton,et al.  Support vector machines for classification and regression. , 2010, The Analyst.

[18]  H. P. Edmundson,et al.  New Methods in Automatic Extracting , 1969, JACM.