Weighted archetypal analysis of the multi-element graph for query-focused multi-document summarization

Most existing research on applying the matrix factorization approaches to query-focused multi-document summarization (Q-MDS) explores either soft/hard clustering or low rank approximation methods. We employ a different kind of matrix factorization method, namely weighted archetypal analysis (wAA) to Q-MDS. In query-focused summarization, given a graph representation of a set of sentences weighted by similarity to the given query, positively and/or negatively salient sentences are values on the weighted data set boundary. We choose to use wAA to compute these extreme values, archetypes, and hence to estimate the importance of sentences in target documents set. We investigate the impact of using the multi-element graph model for query focused summarization via wAA. We conducted experiments on the data of document understanding conference (DUC) 2005 and 2006. Experimental results evidence the improvement of the proposed approach over other closely related methods and many of state-of-the-art systems.

[1]  B. Chan,et al.  Archetypal analysis of galaxy spectra , 2003, astro-ph/0301491.

[2]  Wenjie Li,et al.  Query Focus Guided Sentence Selection Strategy for DUC 2006 , 2006 .

[3]  Chris H. Q. Ding,et al.  Multi-document summarization via sentence-level semantic analysis and symmetric matrix factorization , 2008, SIGIR '08.

[4]  David Cohn,et al.  Recursive Attribute Factoring , 2006, NIPS.

[5]  Christian Bauckhage,et al.  Making Archetypal Analysis Practical , 2009, DAGM-Symposium.

[6]  Lars Kai Hansen,et al.  Archetypal analysis for machine learning , 2010, 2010 IEEE International Workshop on Machine Learning for Signal Processing.

[7]  Rada Mihalcea,et al.  TextRank: Bringing Order into Text , 2004, EMNLP.

[8]  Dragomir R. Radev,et al.  LexRank: Graph-based Lexical Centrality as Salience in Text Summarization , 2004, J. Artif. Intell. Res..

[9]  Sun Park,et al.  Automatic generic document summarization based on non-negative matrix factorization , 2009, Inf. Process. Manag..

[10]  Rasim M. Alguliyev,et al.  CDDS: Constraint-driven document summarization models , 2013, Expert Syst. Appl..

[11]  Jian-Ping Mei,et al.  SumCR: A new subtopic-based extractive approach for text summarization , 2012, Knowledge and Information Systems.

[12]  Wei-Pang Yang,et al.  Text summarization using a trainable summarizer and latent semantic analysis , 2005, Inf. Process. Manag..

[13]  Sun Park,et al.  Query Based Summarization Using Non-negative Matrix Factorization , 2006, KES.

[14]  Eduard H. Hovy,et al.  Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics , 2003, NAACL.

[15]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[16]  Xin Liu,et al.  Generic text summarization using relevance measure and latent semantic analysis , 2001, SIGIR '01.

[17]  Balaraman Ravindran,et al.  Latent Dirichlet Allocation and Singular Value Decomposition Based Multi-document Summarization , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[18]  Furu Wei,et al.  A document-sensitive graph model for multi-document summarization , 2010, Knowledge and Information Systems.

[19]  Ani Nenkova,et al.  Automatic Summarization , 2011, ACL.

[20]  Dragomir R. Radev,et al.  Biased LexRank: Passage retrieval using random walks with question-based priors , 2009, Inf. Process. Manag..

[21]  Giancarlo Ragozini,et al.  On the use of archetypes as benchmarks , 2008 .

[22]  Manuel J. A. Eugster,et al.  Weighted and robust archetypal analysis , 2011, Comput. Stat. Data Anal..

[23]  Rasim M. Alguliyev,et al.  GenDocSum + MCLR: Generic document summarization based on maximum coverage and less redundancy , 2012, Expert Syst. Appl..