Mining Algorithm Roadmap in Scientific Publications

The number of scientific publications is ever increasing. The long time to digest a scientific paper posts great challenges on the number of papers people can read, which impedes a quick grasp of major activities in new research areas especially for intelligence analysts and novice researchers. To accelerate such a process, we first define a new problem called mining algorithm roadmap in scientific publications, and then propose a new weakly supervised method to build the roadmap. The algorithm roadmap describes evolutionary relation between different algorithms, and sketches the undergoing research and the dynamics of the area. It is a tool for analysts and researchers to locate the successors and families of algorithms when analyzing and surveying a research field. We first propose abbreviated words as candidates for algorithms and then use tables as weak supervision to extract these candidates and labels. Next we propose a new method called Cross-sentence Attention NeTwork for cOmparative Relation (CANTOR) to extract comparative algorithms from text. Finally, we derive order for individual algorithm pairs with time and frequency to construct the algorithm roadmap. Through comprehensive experiments, our proposed algorithm shows its superiority over the baseline methods on the proposed task.

[1]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[2]  Daniel Jurafsky,et al.  Learning Syntactic Patterns for Automatic Hypernym Discovery , 2004, NIPS.

[3]  Gerhard Weikum,et al.  WWW 2007 / Track: Semantic Web Session: Ontologies ABSTRACT YAGO: A Core of Semantic Knowledge , 2022 .

[4]  Jiawei Han,et al.  Automated Phrase Mining from Massive Text Corpora , 2017, IEEE Transactions on Knowledge and Data Engineering.

[5]  Isabelle Augenstein,et al.  SemEval 2017 Task 10: ScienceIE - Extracting Keyphrases and Relations from Scientific Publications , 2017, *SEMEVAL.

[6]  Stefano Faralli,et al.  Unsupervised Does Not Mean Uninterpretable: The Case for Word Sense Induction and Disambiguation , 2017, EACL.

[7]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[8]  Daniel Jurafsky,et al.  Distant supervision for relation extraction without labeled data , 2009, ACL.

[9]  Andrew McCallum,et al.  Simultaneously Self-Attending to All Mentions for Full-Abstract Biological Relation Extraction , 2018, NAACL.

[10]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Praveen Paritosh,et al.  Freebase: a collaboratively created graph database for structuring human knowledge , 2008, SIGMOD Conference.

[12]  Haixun Wang,et al.  Probase: a probabilistic taxonomy for text understanding , 2012, SIGMOD Conference.

[13]  Ramesh Nallapati,et al.  Multi-instance Multi-label Learning for Relation Extraction , 2012, EMNLP.

[14]  Jian Pei,et al.  Detecting topic evolution in scientific literature: how can citations help? , 2009, CIKM.

[15]  Ido Dagan,et al.  Improving Hypernymy Detection with an Integrated Path-based and Distributional Method , 2016, ACL.

[16]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[17]  Gregory Grefenstette,et al.  INRIASAC: Simple Hypernym Extraction Methods , 2015, *SEMEVAL.

[18]  Mark A. Przybocki,et al.  The Automatic Content Extraction (ACE) Program – Tasks, Data, and Evaluation , 2004, LREC.

[19]  Mark Ware,et al.  The STM report: An overview of scientific and scholarly journal publishing fourth edition , 2015 .

[20]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[21]  Heng Ji,et al.  CoType: Joint Extraction of Typed Entities and Relations with Knowledge Bases , 2016, WWW.

[22]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[23]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[24]  Jun Zhao,et al.  Relation Classification via Convolutional Deep Neural Network , 2014, COLING.

[25]  Thomas L. Griffiths,et al.  Hierarchical Topic Models and the Nested Chinese Restaurant Process , 2003, NIPS.

[26]  Andrew McCallum,et al.  Gibbs Sampling for Logistic Normal Topic Models with Graph-Based Priors , 2008 .

[27]  Jun Zhao,et al.  Distant Supervision for Relation Extraction via Piecewise Convolutional Neural Networks , 2015, EMNLP.

[28]  Eduard H. Hovy,et al.  End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF , 2016, ACL.

[29]  Christopher Andreas Clark,et al.  Looking Beyond Text: Extracting Figures, Tables and Captions from Computer Science Papers , 2015, AAAI Workshop: Scholarly Big Data.

[30]  Christian Bizer,et al.  DBpedia spotlight: shedding light on the web of documents , 2011, I-Semantics '11.

[31]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[32]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[33]  Brian M. Sadler,et al.  TaxoGen: Unsupervised Topic Taxonomy Construction by Adaptive Term Embedding and Clustering , 2018, KDD.

[34]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[35]  Andrew McCallum,et al.  Modeling Relations and Their Mentions without Labeled Text , 2010, ECML/PKDD.

[36]  Nanyun Peng,et al.  Cross-Sentence N-ary Relation Extraction with Graph LSTMs , 2017, TACL.

[37]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[38]  Hoifung Poon,et al.  Distant Supervision for Relation Extraction beyond the Sentence Boundary , 2016, EACL.

[39]  Mark Steedman,et al.  Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning , 2012 .

[40]  Estevam R. Hruschka,et al.  Toward an Architecture for Never-Ending Language Learning , 2010, AAAI.

[41]  Paul Buitelaar,et al.  SemEval-2016 Task 13: Taxonomy Extraction Evaluation (TExEval-2) , 2016, *SEMEVAL.

[42]  Brian M. Sadler,et al.  TaxoGen: Constructing Topical Concept Taxonomy by Adaptive Term Embedding and Clustering , 2018, KDD 2018.

[43]  Zhiyuan Liu,et al.  Neural Relation Extraction with Selective Attention over Instances , 2016, ACL.