BACO: A Background Knowledge- and Content-Based Framework for Citing Sentence Generation

In this paper, we focus on the problem of citing sentence generation, which entails generating a short text to capture the salient information in a cited paper and the connection between the citing and cited paper. We present BACO, a BAckground knowledgeand COntent-based framework for citing sentence generation, which considers two types of information: (1) background knowledge by leveraging structural information from a citation network; and (2) content, which represents in-depth information about what to cite and why to cite. First, a citation network is encoded to provide background knowledge. Second, we apply salience estimation to identify what to cite by estimating the importance of sentences in the cited paper. During the decoding stage, both types of information are combined to facilitate the text generation. We then conduct joint training of the generator and citation function classification to make the model aware of why to cite. Our experimental results show that our framework outperforms comparative baselines.

[1]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[2]  Dragomir R. Radev,et al.  The ACL anthology network corpus , 2009, Language Resources and Evaluation.

[3]  Plergiorgio Strata,et al.  Citation analysis , 1995, Nature.

[4]  Rada Mihalcea,et al.  TextRank: Bringing Order into Text , 2004, EMNLP.

[5]  Pietro Liò,et al.  Graph Attention Networks , 2017, ICLR.

[6]  Rui Zhang,et al.  Graph-based Neural Multi-Document Summarization , 2017, CoNLL.

[7]  Ben-Ami Lipetz,et al.  Improvement of the selectivity of citation indexes to science literature through inclusion of citation relationship indicators , 1965 .

[8]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[9]  Ichiro Sakata,et al.  Detecting trends in academic research from a citation network using network representation learning , 2018, PloS one.

[10]  Bingfeng Ge,et al.  Development trend forecasting for coherent light generator technology based on patent citation network analysis , 2017, Scientometrics.

[11]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[12]  Guo Zhang,et al.  Content‐based citation analysis: The next generation of citation analysis , 2014, J. Assoc. Inf. Sci. Technol..

[13]  Xiaojun Wan,et al.  Automatic Generation of Citation Texts in Scholarly Papers: A Pilot Study , 2020, ACL.

[14]  Jiancheng Guan,et al.  A bibliometric investigation of research performance in emerging nanobiopharmaceuticals , 2011, J. Informetrics.

[15]  Ummul Khair Ahmad,et al.  Citation practices among non-native expert and novice scientific writers , 2011 .

[16]  Xinbing Wang,et al.  Topic-Sensitive Influential Paper Discovery in Citation Network , 2018, PAKDD.

[17]  Leonardo Neves,et al.  Multimodal Named Entity Disambiguation for Noisy Social Media Posts , 2018, ACL.

[18]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[19]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[20]  Àlex Bravo,et al.  Automatic related work section generation: experiments in scientific document abstracting , 2020, Scientometrics.

[21]  Alan Shiell,et al.  The privileging of communitarian ideas: citation practices and the translation of social capital into public health research. , 2005, American journal of public health.

[22]  Jianhua Hou,et al.  Emerging trends and new developments in information science: a document co-citation analysis (2009–2016) , 2018, Scientometrics.

[23]  Shouyang Wang,et al.  Exploring evolution and emerging trends in business model study: a co-citation analysis , 2017, Scientometrics.

[24]  Jungo Kasai,et al.  ScisummNet: A Large Annotated Corpus and Content-Impact Models for Scientific Paper Summarization with Citation Networks , 2019, AAAI.

[25]  He Zhao,et al.  A Context-based Framework for Modeling the Role and Function of On-line Resource Citations in Scientific Literature , 2019, EMNLP.

[26]  Min-Yen Kan,et al.  Towards Automated Related Work Summarization , 2010, COLING.

[27]  M. Moravcsik,et al.  Some Results on the Function and Quality of Citations , 1975 .

[28]  Awais Athar,et al.  Sentiment Analysis of Citations using Sentence Structure-Based Features , 2011, ACL.

[29]  Dragomir R. Radev,et al.  Purpose and Polarity of Citation: Towards NLP-based Bibliometrics , 2013, NAACL.

[30]  J. Cȏté,et al.  An appraisal of athlete development models through citation network analysis , 2010 .

[31]  H. D. White Citation Analysis and Discourse Analysis Revisited. , 2004 .

[32]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[33]  Dragomir R. Radev,et al.  Blind men and elephants: What do citation summaries tell us about a research article? , 2008, J. Assoc. Inf. Sci. Technol..

[34]  Leonardo Neves,et al.  Multimodal Named Entity Recognition for Short Social Media Posts , 2018, NAACL.

[35]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[36]  Dragomir R. Radev,et al.  LexRank: Graph-based Lexical Centrality as Salience in Text Summarization , 2004, J. Artif. Intell. Res..

[37]  Hai Zhuge,et al.  Automatic generation of related work through summarizing citations , 2019, Concurr. Comput. Pract. Exp..

[38]  Simone Teufel,et al.  Automatic classification of citation function , 2006, EMNLP.

[39]  Christopher D. Manning,et al.  Get To The Point: Summarization with Pointer-Generator Networks , 2017, ACL.

[40]  John Cullars,et al.  Citation Characteristics of Italian and Spanish Literary Monographs , 1990, The Library Quarterly.

[41]  Xiaojun Wan,et al.  Automatic Generation of Related Work Sections in Scientific Papers: An Optimization Approach , 2014, EMNLP.

[42]  Zheng Gao,et al.  Neural Related Work Summarization with a Joint Context-driven Attention Mechanism , 2019, EMNLP.

[43]  Waleed Ammar,et al.  Structural Scaffolds for Citation Intent Classification in Scientific Publications , 2019, NAACL.

[44]  Iz Beltagy,et al.  SciBERT: A Pretrained Language Model for Scientific Text , 2019, EMNLP.

[45]  Hai Zhuge,et al.  Summarization of Related Work through Citations , 2016, 2016 12th International Conference on Semantics, Knowledge and Grids (SKG).