Comparative document summarization via discriminative sentence selection

Given a collection of document groups, a quick question is what are the differences in these groups. In this paper, we study a novel problem of summarizing the differences between document groups. A discriminative sentence selection method is proposed to extract the most discriminative sentences which represent the specific characteristics of each document group. Experiments on real world data sets demonstrate the effectiveness of our proposed method.

[1]  Phyllis B. Baxendale,et al.  Machine-Made Index for Technical Literature - An Experiment , 1958, IBM J. Res. Dev..

[2]  H. P. Edmundson,et al.  New Methods in Automatic Extracting , 1969, JACM.

[3]  Inderjeet Mani,et al.  Multi-Document Summarization by Graph Search and Matching , 1997, AAAI/IAAI.

[4]  Jade Goldstein-Stewart,et al.  The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[5]  Yiming Yang,et al.  Topic Detection and Tracking Pilot Study Final Report , 1998 .

[6]  Yiming Yang,et al.  A study of retrospective and on-line event detection , 1998, SIGIR '98.

[7]  Regina Barzilay,et al.  Information Fusion in the Context of Multi-Document Summarization , 1999, ACL.

[8]  Jade Goldstein-Stewart,et al.  Summarizing text documents: sentence selection and evaluation metrics , 1999, SIGIR '99.

[9]  Kathleen McKeown,et al.  Cut and Paste Based Text Summarization , 2000, ANLP.

[10]  Xin Liu,et al.  Generic text summarization using relevance measure and latent semantic analysis , 2001, SIGIR '01.

[11]  Dianne P. O'Leary,et al.  Text summarization via hidden Markov models , 2001, SIGIR '01.

[12]  James Allan,et al.  Temporal summaries of new topics , 2001, SIGIR '01.

[13]  Yi Zhang,et al.  Novelty and redundancy detection in adaptive filtering , 2002, SIGIR '02.

[14]  M. Maybury,et al.  Automatic Summarization , 2002, Computational Linguistics.

[15]  Daniel Marcu,et al.  Summarization beyond sentence extraction: A probabilistic approach to sentence compression , 2002, Artif. Intell..

[16]  Thorsten Brants,et al.  A System for new event detection , 2003, SIGIR.

[17]  Eduard H. Hovy,et al.  Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics , 2003, NAACL.

[18]  James Allan,et al.  Text classification and named entities for new event detection , 2004, SIGIR '04.

[19]  Satoshi Morinaga,et al.  Tracking dynamics of topic trends using a finite mixture model , 2004, KDD.

[20]  Andrew McCallum,et al.  Automating the Construction of Internet Portals with Machine Learning , 2000, Information Retrieval.

[21]  Inderjeet Mani,et al.  Summarizing Similarities and Differences Among Related Documents , 1997, Information Retrieval.

[22]  Helena Ahonen-Myka,et al.  Simple Semantics in Topic Detection and Tracking , 2004, Information Retrieval.

[23]  Bei Yu,et al.  A cross-collection mixture model for comparative text mining , 2004, KDD.

[24]  Bing Liu,et al.  Mining and summarizing customer reviews , 2004, KDD.

[25]  Dragomir R. Radev,et al.  LexPageRank: Prestige in Multi-Document Text Summarization , 2004, EMNLP.

[26]  Dragomir R. Radev,et al.  Centroid-based summarization of multiple documents , 2004, Inf. Process. Manag..

[27]  Rada Mihalcea,et al.  A Language Independent Algorithm for Single and Multiple Document Summarization , 2005, IJCNLP.

[28]  Chris H. Q. Ding,et al.  On the Equivalence of Nonnegative Matrix Factorization and Spectral Clustering , 2005, SDM.

[29]  Tao Li,et al.  The Relationships Among Various Nonnegative Matrix Factorization Methods for Clustering , 2006, Sixth International Conference on Data Mining (ICDM'06).

[30]  W. Bruce Croft,et al.  Improving novelty detection for general topics using sentence level information patterns , 2006, CIKM '06.

[31]  Kaare Brandt Petersen,et al.  The Matrix Cookbook , 2006 .

[32]  Jinbo Bi,et al.  Active learning via transductive experimental design , 2006, ICML.

[33]  Yun Chi,et al.  Structural and temporal analysis of the blogosphere through community factorization , 2007, KDD '07.

[34]  Kuo Zhang,et al.  New event detection based on indexing-tree and named entity , 2007, SIGIR.

[35]  Ani Nenkova,et al.  The Pyramid Method: Incorporating human content selection variation in summarization evaluation , 2007, TSLP.

[36]  Prasenjit Mitra,et al.  Temporal and Information Flow Based Event Detection from Social Text Streams , 2007, AAAI.

[37]  Philip S. Yu,et al.  Time-dependent event hierarchy construction , 2007, KDD '07.

[38]  Hua Li,et al.  Document Summarization Using Conditional Random Fields , 2007, IJCAI.

[39]  Christopher S. G. Khoo 1 Multi-document Summarization Focusing on Extracting and Integrating Similarities and Differences among Documents , 2007 .

[40]  Joshua Goodman,et al.  Multi-Document Summarization by Maximizing Informative Content-Words , 2007, IJCAI.

[41]  Jiawei Han,et al.  ACM Transactions on Knowledge Discovery from Data: Introduction , 2007 .

[42]  Yihong Gong,et al.  Incremental Spectral Clustering With Application to Monitoring of Evolving Blog Communities , 2007, SDM.

[43]  Xiaojun Wan,et al.  Multi-document summarization using cluster-based link analysis , 2008, SIGIR '08.

[44]  Chris H. Q. Ding,et al.  Multi-document summarization via sentence-level semantic analysis and symmetric matrix factorization , 2008, SIGIR '08.

[45]  Chris H. Q. Ding,et al.  Weighted Consensus Clustering , 2008, SDM.

[46]  Yihong Gong,et al.  Multi-Document Summarization using Sentence-based Topic Models , 2009, ACL.

[47]  Yi Deng,et al.  Evolutionary document summarization for disaster management , 2009, SIGIR.

[48]  Ryan T. McDonald,et al.  Contrastive Summarization: An Experiment with Consumer Reviews , 2009, NAACL.

[49]  Tao Li,et al.  Multi-Document Summarization via the Minimum Dominating Set , 2010, COLING.

[50]  Michael J. Paul,et al.  Summarizing Contrastive Viewpoints in Opinionated Text , 2010, EMNLP.

[51]  Tao Li,et al.  Many are better than one: improving multi-document summarization via weighted consensus , 2010, SIGIR '10.

[52]  Kai Yu,et al.  Feature Selection for Gene Expression Using Model-Based Entropy , 2010, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[53]  Ani Nenkova,et al.  Automatic Summarization , 2011, ACL.