Towards Topic-Aware Slide Generation For Academic Papers With Unsupervised Mutual Learning

Slides are commonly used to present information and tell stories. In academic and research communities, slides are typically used to summarize findings in accepted papers for presentation in meetings and conferences. These slides for academic papers usually contain common and essential topics such as major contributions, model design, experiment details and future work. In this paper, we aim to automatically generate slides for academic papers. We first conducted an in-depth analysis of how humans create slides. We then mined frequently used slide topics. Given a topic, our approach extracts relevant sentences in the paper to provide the draft slides. Due to the lack of labeling data, we integrate prior knowledge of ground truth sentences into a log-linear model to create an initial pseudo-target distribution. Two sentence extractors are learned collaboratively and bootstrap the performance of each other. Evaluation results on a labeled test set show that our model can extract more relevant sentences than baseline methods. Human evaluation also shows slides generated by our model can serve as a good basis for preparing the final

[1]  P. Sreenivasa Kumar,et al.  SlidesGen: Automatic Generation of Presentation Slides for a Technical Paper Using Summarization , 2009, FLAIRS Conference.

[2]  Michael Elhadad,et al.  Query Focused Abstractive Summarization: Incorporating Query Relevance, Multi-Document Coverage, and Summary Length Constraints into seq2seq Models , 2018, ArXiv.

[3]  Christopher D. Manning,et al.  Get To The Point: Summarization with Pointer-Generator Networks , 2017, ACL.

[4]  Ming Zhou,et al.  Unsupervised Neural Machine Translation with SMT as Posterior Regularization , 2019, AAAI.

[5]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[6]  David Konopnicki,et al.  Unsupervised Query-Focused Multi-Document Summarization using the Cross Entropy Method , 2017, SIGIR.

[7]  Xiaojun Wan,et al.  Phrase-Based Presentation Slides Generation for Academic Papers , 2017, AAAI.

[8]  P. Sreenivasa Kumar,et al.  QueSTS: A Query Specific Text Summarization System , 2008, FLAIRS Conference.

[9]  Xiaojun Wan,et al.  PPSGen: Learning-Based Presentation Slides Generation for Academic Papers , 2015, IEEE Transactions on Knowledge and Data Engineering.

[10]  Ani Nenkova,et al.  The Impact of Frequency on Summarization , 2005 .

[11]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[12]  Kôiti Hasida,et al.  Automatic Slide Presentation from Semantically Annotated Documents , 1999, COREF@ACL.

[13]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[14]  Dragomir R. Radev,et al.  LexRank: Graph-based Lexical Centrality as Salience in Text Summarization , 2004, J. Artif. Intell. Res..

[15]  Hoa Trang Dang,et al.  DUC 2005: Evaluation of Question-Focused Summarization Systems , 2006 .

[16]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[17]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[18]  Tiejun Zhao,et al.  Neural Document Summarization by Jointly Learning to Score and Select Sentences , 2018, ACL.

[19]  Mrs. D. A. Phalke,et al.  Survey on Presentation Slides Generation for Academic Papers , 2015 .

[20]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[21]  C. Lee Giles,et al.  ParsCit: an Open-source CRF Reference String Parsing Package , 2008, LREC.

[22]  Navdeep Jaitly,et al.  Pointer Networks , 2015, NIPS.

[23]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[24]  Dragomir R. Radev,et al.  The ACL Anthology Reference Corpus: A Reference Dataset for Bibliographic Research in Computational Linguistics , 2008, LREC.

[25]  Katsumi Nitta,et al.  A Support System for Making Presentation Slides , 2003 .

[26]  Huchuan Lu,et al.  Deep Mutual Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[27]  Huanbo Luan,et al.  Prior Knowledge Integration for Neural Machine Translation using Posterior Regularization , 2017, ACL.

[28]  Marina Litvak,et al.  Query-based summarization using MDL principle , 2017, MultiLing@EACL.

[29]  Erik Osheim,et al.  Summarization System , 2005 .

[30]  Jian Zhang,et al.  SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.

[31]  David Konopnicki,et al.  A Summarization System for Scientific Documents , 2019, EMNLP.

[32]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.