Explaining Relationships Between Scientific Documents

We address the task of explaining relationships between two scientific documents using natural language text. This task requires modeling the complex content of long technical documents, deducing a relationship between these documents, and expressing that relationship in text. Successful solutions can help improve researcher efficiency in search and review. In this paper, we operationalize this task by using citing sentences as a proxy. We establish a large dataset for our task. We pretrain a large language model to serve as the foundation for autoregressive approaches to the task. We explore the impact of taking different views on the two documents, including the use of dense representations extracted with scientific information extraction systems. We provide extensive automatic and human evaluations which show the promise of such models, and make clear the challenges for future work.

[1]  Chris D. Paice,et al.  The automatic generation of literature abstracts: an approach based on the identification of self-indicating phrases , 1980, SIGIR '80.

[2]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[3]  Kyle Lo,et al.  S2ORC: The Semantic Scholar Open Research Corpus , 2020, ACL.

[4]  Min-Yen Kan,et al.  The CL-SciSumm Shared Task 2018: Results and Key Insights , 2019, BIRNDL@SIGIR.

[5]  Roy Schwartz,et al.  Knowledge Enhanced Contextual Word Representations , 2019, EMNLP/IJCNLP.

[6]  Waleed Ammar,et al.  Structural Scaffolds for Citation Intent Classification in Scientific Publications , 2019, NAACL.

[7]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[8]  Ryan McDonald,et al.  On Faithfulness and Factuality in Abstractive Summarization , 2020, ACL.

[9]  Doug Downey,et al.  Don’t Stop Pretraining: Adapt Language Models to Domains and Tasks , 2020, ACL.

[10]  Lutz Bornmann,et al.  Growth rates of modern science: A bibliometric analysis based on the number of publications and cited references , 2014, J. Assoc. Inf. Sci. Technol..

[11]  Jungo Kasai,et al.  ScisummNet: A Large Annotated Corpus and Content-Impact Models for Scientific Paper Summarization with Citation Networks , 2019, AAAI.

[12]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[13]  Sandeep Subramanian,et al.  On Extractive and Abstractive Neural Document Summarization with Transformer Language Models , 2020, EMNLP.

[14]  Hans Peter Luhn,et al.  The Automatic Creation of Literature Abstracts , 1958, IBM J. Res. Dev..

[15]  Yejin Choi,et al.  COMET: Commonsense Transformers for Automatic Knowledge Graph Construction , 2019, ACL.

[16]  Yejin Choi,et al.  The Curious Case of Neural Text Degeneration , 2019, ICLR.

[17]  Franck Dernoncourt,et al.  A Discourse-Aware Attention Model for Abstractive Summarization of Long Documents , 2018, NAACL.

[18]  Nazli Goharian,et al.  Contextualizing Citations for Scientific Summarization using Word Embeddings and Domain Knowledge , 2017, SIGIR.

[19]  Daniel Jurafsky,et al.  Measuring the Evolution of a Scientific Field through Citation Frames , 2018, TACL.

[20]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[21]  Nazli Goharian,et al.  Scientific Article Summarization Using Citation-Context and Article’s Discourse Structure , 2015, EMNLP.

[22]  Jaewoo Kang,et al.  BioBERT: a pre-trained biomedical language representation model for biomedical text mining , 2019, Bioinform..

[23]  Chandra Bhagavatula,et al.  Content-Based Citation Recommendation , 2018, NAACL.

[24]  Ali Farhadi,et al.  Defending Against Neural Fake News , 2019, NeurIPS.

[25]  Dragomir R. Radev,et al.  Scientific Paper Summarization Using Citation Summary Networks , 2008, COLING.

[26]  Ramesh Nallapati,et al.  Joint latent topic models for text and citations , 2008, KDD.

[27]  Marti A. Hearst,et al.  Citances: Citation Sentences for Semantic Analysis of Bioscience Text , 2004 .

[28]  Yann Dauphin,et al.  Hierarchical Neural Story Generation , 2018, ACL.

[29]  Daniel S. Weld,et al.  TLDR: Extreme Summarization of Scientific Documents , 2020, FINDINGS.

[30]  Mirella Lapata,et al.  Text Generation from Knowledge Graphs with Graph Transformers , 2019, NAACL.

[31]  Iz Beltagy,et al.  SciBERT: A Pretrained Language Model for Scientific Text , 2019, EMNLP.

[32]  Mari Ostendorf,et al.  Multi-Task Identification of Entities, Relations, and Coreference for Scientific Knowledge Graph Construction , 2018, EMNLP.

[33]  Sean M. McNee,et al.  On the recommending of citations for research papers , 2002, CSCW '02.