StreamHover: Livestream Transcript Summarization and Annotation

With the explosive growth of livestream broadcasting, there is an urgent need for new summarization technology that enables us to create a preview of streamed content and tap into this wealth of knowledge. However, the problem is nontrivial due to the informal nature of spoken language. Further, there has been a shortage of annotated datasets that are necessary for transcript summarization. In this paper, we present StreamHover, a framework for annotating and summarizing livestream transcripts. With a total of over 500 hours of videos annotated with both extractive and abstractive summaries, our benchmark dataset is significantly larger than currently existing annotated corpora. We explore a neural extractive summarization model that leverages vector-quantized variational autoencoder to learn latent vector representations of spoken utterances and identify salient utterances from the transcripts to form summaries. We show that our model generalizes better and improves performance over strong baselines. The results of this study provide an avenue for future research to improve summarization solutions for efficient browsing of livestreams.

[1]  Franck Dernoncourt,et al.  Analyzing Sentence Fusion in Abstractive Summarization , 2019, EMNLP.

[2]  Jia Jin Koay,et al.  A Sliding-Window Approach to Automatic Creation of Meeting Minutes , 2021, NAACL.

[3]  Dragomir R. Radev,et al.  LexRank: Graph-based Lexical Centrality as Salience in Text Summarization , 2004, J. Artif. Intell. Res..

[4]  Mehdi Rezagholizadeh,et al.  Fully Quantized Transformer for Machine Translation , 2020, EMNLP.

[5]  Rui Zhang,et al.  Graph-based Neural Multi-Document Summarization , 2017, CoNLL.

[6]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[7]  Kathleen McKeown,et al.  Content Selection in Deep Learning Models of Summarization , 2018, EMNLP.

[8]  Ron Artstein,et al.  Survey Article: Inter-Coder Agreement for Computational Linguistics , 2008, CL.

[9]  Oriol Vinyals,et al.  Neural Discrete Representation Learning , 2017, NIPS.

[10]  Dragomir R. Radev,et al.  Multi-News: A Large-Scale Multi-Document Summarization Dataset and Abstractive Hierarchical Model , 2019, ACL.

[11]  Feifan Liu,et al.  Correlation between ROUGE and Human Evaluation of Extractive Meeting Summaries , 2008, ACL.

[12]  Arman Cohan,et al.  Longformer: The Long-Document Transformer , 2020, ArXiv.

[13]  Mirella Lapata,et al.  Ranking Sentences for Extractive Summarization with Reinforcement Learning , 2018, NAACL.

[14]  Jackie Chi Kit Cheung,et al.  Countering the Effects of Lead Bias in News Summarization via Multi-Stage Training and Auxiliary Losses , 2019, EMNLP.

[15]  Jiacheng Xu,et al.  Neural Extractive Text Summarization with Syntactic Compression , 2019, EMNLP.

[16]  Rada Mihalcea,et al.  TextRank: Bringing Order into Text , 2004, EMNLP.

[17]  Claire Cardie,et al.  Domain-Independent Abstract Generation for Focused Meeting Summarization , 2013, ACL.

[18]  Jean Carletta,et al.  The AMI Meeting Corpus: A Pre-announcement , 2005, MLMI.

[19]  Ryan McDonald,et al.  On Faithfulness and Factuality in Abstractive Summarization , 2020, ACL.

[20]  K. McKeown,et al.  Exploring Content Selection in Summarization of Novel Chapters , 2020, ACL.

[21]  Paul N. Bennett,et al.  Transformer-XH: Multi-Evidence Reasoning with eXtra Hop Attention , 2020, ICLR.

[22]  Zhe Feng,et al.  A New Approach to Overgenerating and Scoring Abstractive Summaries , 2021, NAACL.

[23]  Andreas Stolcke,et al.  The ICSI Meeting Corpus , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[24]  Chenguang Zhu,et al.  MediaSum: A Large-scale Media Interview Dataset for Dialogue Summarization , 2021, NAACL.

[25]  Alexander I. Rudnicky,et al.  Using the Amazon Mechanical Turk to Transcribe and Annotate Meeting Speech for Extractive Summarization , 2010, Mturk@HLT-NAACL.

[26]  Mirella Lapata,et al.  Extractive Opinion Summarization in Quantized Transformer Spaces , 2020, Transactions of the Association for Computational Linguistics.

[27]  Christopher D. Manning,et al.  Get To The Point: Summarization with Pointer-Generator Networks , 2017, ACL.

[28]  Yu Zhou,et al.  MSMO: Multimodal Summarization with Multimodal Output , 2018, EMNLP.

[29]  Yiming Yang,et al.  A Surprisingly Effective Fix for Deep Latent Variable Modeling of Text , 2019, EMNLP.

[30]  Saif Mohammad,et al.  Capturing Reliable Fine-Grained Sentiment Associations by Crowdsourcing and Best–Worst Scaling , 2016, NAACL.

[31]  Franck Dernoncourt,et al.  A Discourse-Aware Attention Model for Abstractive Summarization of Long Documents , 2018, NAACL.

[32]  Yen-Chun Chen,et al.  Fast Abstractive Summarization with Reinforce-Selected Sentence Rewriting , 2018, ACL.

[33]  Dragomir R. Radev,et al.  QMSum: A New Benchmark for Query-based Multi-domain Meeting Summarization , 2021, NAACL.

[34]  Gerald Penn,et al.  A Critical Reassessment of Evaluation Baselines for Speech Summarization , 2008, ACL.

[35]  Kilian Q. Weinberger,et al.  BERTScore: Evaluating Text Generation with BERT , 2019, ICLR.

[36]  Yoshua Bengio,et al.  Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation , 2013, ArXiv.

[37]  Mirella Lapata,et al.  Screenplay Summarization Using Latent Narrative Structure , 2020, ACL.

[38]  Mirella Lapata,et al.  Don’t Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization , 2018, EMNLP.

[39]  Pengfei Liu,et al.  Heterogeneous Graph Neural Networks for Extractive Document Summarization , 2020, ACL.

[40]  Dongyan Zhao,et al.  VMSMO: Learning to Generate Multimodal Summary for Video-based News Articles , 2020, EMNLP.

[41]  Yannick Estève,et al.  TED-LIUM 3: twice as much data and corpus repartition for experiments on speaker adaptation , 2018, SPECOM.

[42]  Ivan Titov,et al.  Few-Shot Learning for Opinion Summarization , 2020, EMNLP.

[43]  Mirella Lapata,et al.  Text Summarization with Pretrained Encoders , 2019, EMNLP.

[44]  Giuseppe Carenini,et al.  Extractive Summarization of Long Documents by Combining Global and Local Context , 2019, EMNLP.

[45]  Florian Metze,et al.  Multimodal Abstractive Summarization for How2 Videos , 2019, ACL.

[46]  Ani Nenkova,et al.  Beyond SumBasic: Task-focused summarization with sentence simplification and lexical expansion , 2007, Information Processing & Management.

[47]  Fei Liu,et al.  How Domain Terminology Affects Meeting Summarization Performance , 2020, COLING.

[48]  Kevin Gimpel,et al.  SummScreen: A Dataset for Abstractive Screenplay Summarization , 2021, ArXiv.

[49]  Karl Stratos,et al.  Discrete Latent Variable Representations for Low-Resource Text Classification , 2020, ACL.

[50]  Matt Huenerfauth,et al.  A Corpus for Modeling Word Importance in Spoken Dialogue Transcripts , 2018, LREC.

[51]  Xiaojun Wan,et al.  Abstractive Document Summarization with a Graph-Based Attentional Neural Model , 2017, ACL.

[52]  Colin Raffel,et al.  Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[53]  Jackie Chi Kit Cheung,et al.  BanditSum: Extractive Summarization as a Contextual Bandit , 2018, EMNLP.

[54]  Thibault Sellam,et al.  BLEURT: Learning Robust Metrics for Text Generation , 2020, ACL.

[55]  Richard Socher,et al.  Neural Text Summarization: A Critical Evaluation , 2019, EMNLP.

[56]  Giuseppe Carenini,et al.  Summarizing Spoken and Written Conversations , 2008, EMNLP.

[57]  Furu Wei,et al.  Faithful to the Original: Fact Aware Neural Abstractive Summarization , 2017, AAAI.

[58]  John J. Godfrey,et al.  SWITCHBOARD: telephone speech corpus for research and development , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[59]  Alexander M. Rush,et al.  Bottom-Up Abstractive Summarization , 2018, EMNLP.

[60]  Heng Ji,et al.  Keep Meeting Summaries on Topic: Abstractive Multi-Modal Meeting Summarization , 2019, ACL.