AI Marker-based Large-scale AI Literature Mining

The knowledge contained in academic literature is interesting to mine. Inspired by the idea of molecular markers tracing in the field of biochemistry, three named entities, namely, methods, datasets and metrics are used as AI markers for AI literature. These entities can be used to trace the research process described in the bodies of papers, which opens up new perspectives for seeking and mining more valuable academic information. Firstly, the entity extraction model is used in this study to extract AI markers from large-scale AI literature. Secondly, original papers are traced for AI markers. Statistical and propagation analysis are performed based on tracing results. Finally, the co-occurrences of AI markers are used to achieve clustering. The evolution within method clusters and the influencing relationships amongst different research scene clusters are explored. The above-mentioned mining based on AI markers yields many meaningful discoveries. For example, the propagation of effective methods on the datasets is rapidly increasing with the development of time; effective methods proposed by China in recent years have increasing influence on other countries, whilst France is the opposite. Saliency detection, a classic computer vision research scene, is the least likely to be affected by other research scenes.

[1]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[2]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[3]  Bin Zhou,et al.  Apj+ Vessels Drive Tumor Growth and Represent a Tractable Therapeutic Target. , 2018, Cell reports.

[4]  Jie Tang,et al.  ArnetMiner: extraction and mining of academic social networks , 2008, KDD.

[5]  Satya Ranjan Sahu,et al.  Does the multi-authorship trend influence the quality of an article? , 2013, Scientometrics.

[6]  Walter Daelemans,et al.  Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) , 2014, EMNLP 2014.

[7]  Masaki Eto Rough co‐citation as a measure of relationship to expand co‐citation networks for scientific paper searches , 2016, ASIST.

[8]  Noah A. Smith,et al.  Transition-Based Dependency Parsing with Stack Long Short-Term Memory , 2015, ACL.

[9]  Ting Liu,et al.  Open innovation from the perspective of network embedding: knowledge evolution and development trend , 2020, Scientometrics.

[10]  Philip S. Yu,et al.  A Survey on Knowledge Graphs: Representation, Acquisition and Applications , 2020, ArXiv.

[11]  Lei Shi,et al.  VEGAS: Visual influEnce GrAph Summarization on Citation Networks , 2015, IEEE Transactions on Knowledge and Data Engineering.

[12]  M. de Rijke,et al.  Personalised Reranking of Paper Recommendations Using Paper Content and User Behavior , 2019, ACM Trans. Inf. Syst..

[13]  Jie Tang,et al.  Citation count prediction: learning to estimate future citations for literature , 2011, CIKM '11.

[14]  Jun Zhang,et al.  ENHANCE NMF-BASED RECOMMENDATION SYSTEMS WITH SOCIAL INFORMATION IMPUTATION , 2018, Computer Science & Information Technology (CS & IT).

[15]  Anke Piepenbrink,et al.  Topics in the literature of transition economies and emerging markets , 2014, Scientometrics.

[16]  Adnan Noor Mian,et al.  A bibliometric analysis of publications in computer networking research , 2019, Scientometrics.

[17]  Cassidy R. Sugimoto,et al.  The shifting sands of disciplinary development: Analyzing North American Library and Information Science dissertations using latent Dirichlet allocation , 2011, J. Assoc. Inf. Sci. Technol..

[18]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[19]  Thomas L. Griffiths,et al.  Probabilistic author-topic models for information discovery , 2004, KDD.

[20]  Ye Wang,et al.  Science behind AI: the evolution of trend, mobility, and collaboration , 2020, Scientometrics.

[21]  H. Kimura,et al.  A genetically encoded probe for imaging nascent and mature HA-tagged proteins in vivo , 2019, Nature Communications.

[22]  Eduard H. Hovy,et al.  End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF , 2016, ACL.

[23]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[24]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[25]  Daniel Jurafsky,et al.  Studying the History of Ideas Using Topic Models , 2008, EMNLP.

[26]  Mohammed Shahadat Uddin,et al.  Trend and efficiency analysis of co-authorship network , 2011, Scientometrics.

[27]  Feng Xia,et al.  Web of Scholars: A Scholar Knowledge Graph , 2020, SIGIR.

[28]  Gabriel Stanovsky,et al.  DROP: A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs , 2019, NAACL.

[29]  Bin Zheng,et al.  BMC Bioinformatics BioMed Central , 2005 .

[30]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[31]  Wenhu Chen,et al.  Mining Algorithm Roadmap in Scientific Publications , 2019, KDD.