Automatic Slide Generation for Scientific Papers

We describe our approach for automatically generating presentation slides for scientific papers using deep neural networks. Such slides can help authors have a starting point for their slide generation process. Extractive summarization techniques are applied to rank and select important sentences from the original document. Previous work identified important sentences based only on a limited number of features that were extracted from the position and structure of sentences in the paper. Our method extends previous work by (1) extracting a more comprehensive list of surface features, (2) considering semantic or meaning of the sentence, and (3) using context around the current sentence to rank the sentences. Once, the sentences are ranked, salient sentences are selected using Integer Linear Programming (ILP). Our results show the efficacy of our model for summarization and the slide generation task.

[1]  Luis Argerich,et al.  Variations of the Similarity Function of TextRank for Automated Summarization , 2016, ArXiv.

[2]  Yang Liu,et al.  Using Supervised Bigram-based ILP for Extractive Summarization , 2013, ACL.

[3]  M. de Rijke,et al.  Leveraging Contextual Sentence Relations for Extractive Summarization Using a Neural Attention Model , 2017, SIGIR.

[4]  Tiejun Zhao,et al.  Neural Document Summarization by Jointly Learning to Score and Select Sentences , 2018, ACL.

[5]  Sujian Li,et al.  Multi-document Summarization Using Support Vector Regression , 2007 .

[6]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[7]  Xiaojun Wan,et al.  Phrase-Based Presentation Slides Generation for Academic Papers , 2017, AAAI.

[8]  Ryan T. McDonald A Study of Global Inference Algorithms in Multi-document Summarization , 2007, ECIR.

[9]  Xiaojun Wan,et al.  PPSGen: Learning-Based Presentation Slides Generation for Academic Papers , 2015, IEEE Transactions on Knowledge and Data Engineering.

[10]  Dragomir R. Radev,et al.  LexRank: Graph-based Lexical Centrality as Salience in Text Summarization , 2004, J. Artif. Intell. Res..

[11]  Patrice Lopez,et al.  GROBID: Combining Automatic Bibliographic Data Recognition and Term Extraction for Scholarship Publications , 2009, ECDL.

[12]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[13]  Dianne P. O'Leary,et al.  Text summarization via hidden Markov models , 2001, SIGIR '01.

[14]  Rada Mihalcea,et al.  TextRank: Bringing Order into Text , 2004, EMNLP.

[15]  Ming Zhou,et al.  A Redundancy-Aware Sentence Regression Framework for Extractive Summarization , 2016, COLING.

[16]  Hayato Kobayashi,et al.  Summarization Based on Embedding Distributions , 2015, EMNLP.

[17]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[18]  Wai Lam,et al.  MEAD - A Platform for Multidocument Multilingual Text Summarization , 2004, LREC.

[19]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[20]  Hua Li,et al.  Document Summarization Using Conditional Random Fields , 2007, IJCAI.

[21]  Kôiti Hasida,et al.  Automatic Slide Presentation from Semantically Annotated Documents , 1999, COREF@ACL.

[22]  John M. Conroy,et al.  An Assessment of the Accuracy of Automatic Evaluation in Summarization , 2012, EvalMetrics@NAACL-HLT.

[23]  Jade Goldstein-Stewart,et al.  The Use of MMR, Diversity-Based Reranking for Reordering Documents and Producing Summaries , 1998, SIGIR Forum.