Revisiting Centrality-as-Relevance: Support Sets and Similarity as Geometric Proximity: Extended abstract

In automatic summarization, centrality-as-relevance means that the most important content of an information source, or of a collection of information sources, corresponds to the most central passages, considering a representation where such notion makes sense (graph, spatial, etc.). We assess the main paradigms and introduce a new centrality-based relevance model for automatic summarization that relies on the use of support sets to better estimate the relevant content. Geometric proximity is used to compute semantic relatedness. Centrality (relevance) is determined by considering the whole input source (and not only local information), and by taking into account the existence of minor topics or lateral subjects in the information sources to be summarized. The method consists in creating, for each passage of the input source, a support set consisting only of the most semantically related passages. Then, the determination of the most relevant content is achieved by selecting the passages that occur in the largest number of support sets. This model produces extractive summaries that are generic, and language- and domain-independent. Thorough automatic evaluation shows that the method achieves state-of-the-art performance, both in written text, and automatically transcribed speech summarization, even when compared to considerably more complex approaches.

[1]  Berlin Chen,et al.  Extractive speech summarization - from the view of decision theory , 2010, INTERSPEECH.

[2]  Lucas Antiqueira,et al.  A complex network approach to text summarization , 2009, Inf. Sci..

[3]  Sanda M. Harabagiu,et al.  Topic themes for multi-document summarization , 2005, SIGIR '05.

[4]  Constantin Orasan,et al.  A Comparison of Summarisation Methods Based on Term Specificity Estimation , 2004, LREC.

[5]  Dragomir R. Radev,et al.  LexRank: Graph-based Lexical Centrality as Salience in Text Summarization , 2004, J. Artif. Intell. Res..

[6]  Ricardo Ribeiro,et al.  Mixed-Source Multi-Document Speech-to-Text Summarization , 2008, COLING 2008.

[7]  Gerald Penn,et al.  A Critical Reassessment of Evaluation Baselines for Speech Summarization , 2008, ACL.

[8]  Rada Mihalcea,et al.  A Language Independent Algorithm for Single and Multiple Document Summarization , 2005, IJCNLP.

[9]  Julia Hirschberg,et al.  Comparing lexical, acoustic/prosodic, structural and discourse features for speech summarization , 2005, INTERSPEECH.

[10]  B. C. Brookes,et al.  Information Sciences , 2020, Cognitive Skills You Need for the 21st Century.

[11]  Donna Harman,et al.  Multi-task multi-modality SVM for early COVID-19 Diagnosis using chest CT data , 2021, Information Processing & Management.

[12]  Xin Liu,et al.  Generic text summarization using relevance measure and latent semantic analysis , 2001, SIGIR '01.

[13]  Brigitte Endres-Niggemeyer,et al.  Summarizing information , 1998 .

[14]  Dragomir R. Radev,et al.  Centroid-based summarization of multiple documents: sentence extraction, utility-based evaluation, and user studies , 2000, ArXiv.

[15]  Rada Mihalcea,et al.  TextRank: Bringing Order into Text , 2004, EMNLP.

[16]  Kathleen R. McKeown,et al.  A description of the CIDR system as used for TDT-2 , 1999 .

[17]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[18]  Dragomir R. Radev,et al.  LexRank: Graph-based Centrality as Salience in Text Summarization , 2004 .

[19]  K. Spärck Jones,et al.  Between shallow and deep: an experiment in automatic summarising , 2005 .

[20]  Oren Kurland,et al.  PageRank without hyperlinks: structural re-ranking using links induced by language models , 2005, SIGIR '05.

[21]  Jade Goldstein-Stewart,et al.  The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[22]  Peter W. Foltz,et al.  An introduction to latent semantic analysis , 1998 .

[23]  Steve Renals,et al.  Term-Weighting for Summarization of Multi-party Spoken Dialogues , 2007, MLMI.

[24]  Ani Nenkova,et al.  Beyond SumBasic: Task-focused summarization with sentence simplification and lexical expansion , 2007, Information Processing & Management.

[25]  Maria das Graças Volpe Nunes,et al.  A comprehensive comparative evaluation of RST-based summarization methods , 2010, TSLP.

[26]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[27]  Alexander H. Waibel,et al.  Minimizing Word Error Rate in Textual Summaries of Spoken Language , 2000, ANLP.

[28]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[29]  Gerda Ruge,et al.  Experiments on Linguistically-Based Term Associations , 1992, Inf. Process. Manag..

[30]  Ricardo Ribeiro,et al.  Using prior knowledge to assess relevance in speech summarization , 2008, 2008 IEEE Spoken Language Technology Workshop.

[31]  E. Ziegel,et al.  Bootstrapping: A Nonparametric Approach to Statistical Inference , 1993 .

[32]  Charu C. Aggarwal,et al.  On the Surprising Behavior of Distance Metrics in High Dimensional Spaces , 2001, ICDT.

[33]  Ricardo Ribeiro,et al.  Centrality-as-Relevance: Support Sets and Similarity as Geometric Proximity , 2011, J. Artif. Intell. Res..