Comparative news summarization using concept-based optimization

Comparative news summarization aims to highlight the commonalities and differences between two comparable news topics by using human-readable sentences. The summary ought to focus on the salient comparative aspects of both topics, and at the same time, it should describe the representative properties of each topic appropriately. In this study, we propose a novel approach for generating comparative news summaries. We consider cross-topic pairs of semantic-related concepts as evidences of comparativeness and consider topic-related concepts as evidences of representativeness. The score of a summary is estimated by summing up the weights of evidences in the summary. We formalize the summarization task as an optimization problem of selecting proper sentences to maximize this score and address the problem by using a mixed integer programming model. The experimental results demonstrate the effectiveness of our proposed model.

[1]  Qun Liu,et al.  基於《知網》的辭彙語義相似度計算 (Word Similarity Computing Based on How-net) [In Chinese] , 2002, ROCLING/IJCLCLP.

[2]  Rui Li,et al.  Competitor Mining with the Web , 2008, IEEE Transactions on Knowledge and Data Engineering.

[3]  Raimo Anttila,et al.  Historical and comparative linguistics , 1989 .

[4]  G. Dantzig Programming of Interdependent Activities: II Mathematical Model , 1949 .

[5]  Bing Liu,et al.  Mining Comparative Sentences and Relations , 2006, AAAI.

[6]  Abhinandan Das,et al.  Google news personalization: scalable online collaborative filtering , 2007, WWW '07.

[7]  César de Pablo-Sánchez,et al.  Lightly supervised acquisition of named entities and linguistic patterns for multilingual text mining , 2012, Knowledge and Information Systems.

[8]  Xiaojun Wan,et al.  Summarizing the differences in multilingual news , 2011, SIGIR.

[9]  Narendra Karmarkar,et al.  A new polynomial-time algorithm for linear programming , 1984, STOC '84.

[10]  Bing Liu,et al.  Identifying comparative sentences in text documents , 2006, SIGIR.

[11]  L. Khachiyan Polynomial algorithms in linear programming , 1980 .

[12]  Wei-Pang Yang,et al.  Text summarization using a trainable summarizer and latent semantic analysis , 2005, Inf. Process. Manag..

[13]  Furu Wei,et al.  A document-sensitive graph model for multi-document summarization , 2010, Knowledge and Information Systems.

[14]  Gerard Salton,et al.  Automatic Text Structuring and Summarization , 1997, Inf. Process. Manag..

[15]  J. Lintvelt Dynamics of Modernization , 1998 .

[16]  Eduard Hovy,et al.  Automated Text Summarization in SUMMARIST , 1997, ACL 1997.

[17]  René Witte,et al.  Next-Generation Summarization: Contrastive, Focused, and Update Summaries , 2007 .

[18]  Lisa F. Rau,et al.  Information extraction and text summarization using linguistic knowledge acquisition , 1989, Inf. Process. Manag..

[19]  Bruno Pouliquen,et al.  Sentiment Analysis in the News , 2010, LREC.

[20]  Jade Goldstein-Stewart,et al.  The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[21]  Ted Pedersen,et al.  WordNet::Similarity - Measuring the Relatedness of Concepts , 2004, NAACL.

[22]  Ryan T. McDonald,et al.  Contrastive Summarization: An Experiment with Consumer Reviews , 2009, NAACL.

[23]  Jian-Ping Mei,et al.  SumCR: A new subtopic-based extractive approach for text summarization , 2012, Knowledge and Information Systems.

[24]  Patrick Pantel,et al.  How do they compare? Automatic identification of comparable entities on the Web , 2011, 2011 IEEE International Conference on Information Reuse & Integration.

[25]  Bei Yu,et al.  A cross-collection mixture model for comparative text mining , 2004, KDD.

[26]  Bing Liu,et al.  Opinion observer: analyzing and comparing opinions on the Web , 2005, WWW '05.

[27]  Yuji Matsumoto,et al.  A new approach to unsupervised text summarization , 2001, SIGIR '01.

[28]  Dragomir R. Radev,et al.  LexPageRank: Prestige in Multi-Document Text Summarization , 2004, EMNLP.

[29]  Chin-Yew Lin,et al.  Automatic Evaluation of Machine Translation Quality Using Longest Common Subsequence and Skip-Bigram Statistics , 2004, ACL.

[30]  Martin Halvey,et al.  WWW '07: Proceedings of the 16th international conference on World Wide Web , 2007, WWW 2007.

[31]  L. G. H. Cijan A polynomial algorithm in linear programming , 1979 .

[32]  Sasha R. Weitman,et al.  The Dynamics of Modernization: A Study in Comparative History. , 1969 .

[33]  Hua Li,et al.  Document Summarization Using Conditional Random Fields , 2007, IJCAI.

[34]  ChengXiang Zhai,et al.  Generating comparative summaries of contradictory opinions in text , 2009, CIKM.

[35]  Rada Mihalcea,et al.  Graph-based Ranking Algorithms for Sentence Extraction, Applied to Text Summarization , 2004, ACL.

[36]  Man Lung Yiu,et al.  Group-by skyline query processing in relational engines , 2009, CIKM.

[37]  Eduard H. Hovy,et al.  Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics , 2003, NAACL.

[38]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[39]  James Allan,et al.  Topic detection and tracking: event-based information organization , 2002 .

[40]  Jiahui Liu,et al.  Compare&contrast: using the web to discover comparable cases for news stories , 2007, WWW '07.

[41]  Ulrich Weisstein,et al.  Comparative literature and literary theory: survey and introduction , 1974 .

[42]  Charles L. Wayne Topic Detection & Tracking ( TDT ) Overview & Perspective , 1998 .

[43]  Jin Wang,et al.  CoMiner: An Effective Algorithm for Mining Competitors from the Web , 2006, Sixth International Conference on Data Mining (ICDM'06).

[44]  Katsumi Tanaka,et al.  Fair News Reader: Recommending News Articles with Different Sentiments Based on User Preference , 2007, KES.

[45]  Arend Lijphart,et al.  Comparative Politics and the Comparative Method , 1971, American Political Science Review.

[46]  Xiaojun Wan,et al.  Manifold-Ranking Based Topic-Focused Multi-Document Summarization , 2007, IJCAI.

[47]  G. Dantzig,et al.  Programming of Interdependent Activities: I General Discussion , 1949 .

[48]  Yi Zhang,et al.  D2S: Document-to-sentence framework for novelty detection , 2011, Knowledge and Information Systems.

[49]  Alexander F. Gelbukh,et al.  Mining the News: Trends, Associations, and Deviations , 2001, Computación y Sistemas.

[50]  Ralph Grishman,et al.  Semi-supervised Relation Extraction with Large-scale Word Clustering , 2011, ACL.

[51]  Regina Barzilay,et al.  Sentence Fusion for Multidocument News Summarization , 2005, CL.

[52]  Michael J. Paul,et al.  Summarizing Contrastive Viewpoints in Opinionated Text , 2010, EMNLP.

[53]  Christopher Kennedy Comparatives , Semantics of ∗ , 2022 .

[54]  Ani Nenkova,et al.  Automatic Summarization , 2011, ACL.

[55]  Ryan T. McDonald A Study of Global Inference Algorithms in Multi-document Summarization , 2007, ECIR.

[56]  Hoa Trang Dang,et al.  Overview of the TAC 2008 Update Summarization Task , 2008, TAC.

[57]  Dilek Z. Hakkani-Tür,et al.  The ICSI Summarization System at TAC 2008 , 2008, TAC.

[58]  Sebastian Thrun,et al.  Text Classification from Labeled and Unlabeled Documents using EM , 2000, Machine Learning.