The Microsoft Academic Knowledge Graph: A Linked Data Source with 8 Billion Triples of Scholarly Data

In this paper, we present the Microsoft Academic Knowledge Graph (MAKG), a large RDF data set with over eight billion triples with information about scientific publications and related entities, such as authors, institutions, journals, and fields of study. The data set is licensed under the Open Data Commons Attribution License (ODC-By). By providing the data as RDF dump files as well as a data source in the Linked Open Data cloud with resolvable URIs and links to other data sources, we bring a vast amount of scholarly data to the Web of Data. Furthermore, we provide entity embeddings for all 210 million represented publications. We facilitate a number of use case scenarios, particularly in the field of digital libraries, such as (1) entity-centric exploration of papers, researchers, affiliations, etc.; (2) data integration tasks using RDF as a common data model and links to other data sources; and (3) data analysis and knowledge discovery of scholarly data.

[1]  Christoph Lange,et al.  Towards a Knowledge Graph Representing Research Findings by Semantifying Survey Articles , 2017, TPDL.

[2]  Elena Paslaru Bontas Simperl,et al.  Smart Papers: Dynamic Publications on the Blockchain , 2018, ESWC.

[3]  Adam Jatowt,et al.  PaperHunter: A System for Exploring Papers and Citation Contexts , 2019, ECIR.

[4]  Silvio Peroni,et al.  Setting our bibliographic references free: towards open citation data , 2015, J. Documentation.

[5]  Fabio Viola,et al.  Enabling Interoperability in the Internet of Things: A OSGi Semantic Information Broker Implementation , 2017 .

[6]  Silvio Peroni,et al.  The SPAR Ontologies , 2018, SEMWEB.

[7]  Nikolas Mitrou,et al.  Exposing scholarly information as Linked Open Data: RDFizing DSpace contents , 2014, Electron. Libr..

[8]  Diego Reforgiato Recupero,et al.  Conference Live: Accessible and Sociable Conference Semantic Data , 2015, WWW.

[9]  Adam Jatowt,et al.  A High-Quality Gold Standard for Citation-based Tasks , 2018, LREC.

[10]  Yang Song,et al.  An Overview of Microsoft Academic Service (MAS) and Applications , 2015, WWW.

[11]  Adam Jatowt,et al.  ScholarSight: Visualizing Temporal Trends of Scientific Concepts , 2019, 2019 ACM/IEEE Joint Conference on Digital Libraries (JCDL).

[12]  Achim Rettinger,et al.  X-LiSA: Cross-lingual Semantic Annotation , 2014, Proc. VLDB Endow..

[13]  Andrea Giovanni Nuzzolese,et al.  Conference Linked Data: The ScholarlyData Project , 2016, SEMWEB.

[14]  Juan Trujillo,et al.  Current state of Linked Data in digital libraries , 2016, J. Inf. Sci..

[15]  Michael Färber,et al.  Bibliometric-Enhanced arXiv: A Data Set for Paper-Based and Citation-Based Tasks , 2019, BIR@ECIR.

[16]  Sören Auer,et al.  Open Research Knowledge Graph: A System Walkthrough , 2019, TPDL.

[17]  Krzysztof Janowicz,et al.  Five stars of Linked Data vocabulary use , 2014, Semantic Web.

[18]  Petr Knoth,et al.  An Analysis of the Microsoft Academic Graph , 2016, D Lib Mag..

[19]  Muhammad Ahtisham Aslam,et al.  SPedia: A Central Hub for the Linked Open Data of Scientific Publications , 2017, Int. J. Semantic Web Inf. Syst..

[20]  Amit P. Sheth,et al.  SwetoDblp ontology of Computer Science publications , 2007, J. Web Semant..

[21]  Xinbing Wang,et al.  AceKG: A Large-scale Knowledge Graph for Academic Data Mining , 2018, CIKM.

[22]  Nicky Agate Open Research Knowledge Graph: Towards Machine Actionability in Scholarly Communication , 2019 .

[23]  Tanmoy Chakraborty,et al.  Go Wide, Go Deep: Quantifying the Impact of Scientific Papers Through Influence Dispersion Trees , 2019, 2019 ACM/IEEE Joint Conference on Digital Libraries (JCDL).

[24]  Carlos Guestrin,et al.  Over-optimization of academic publishing metrics: observing Goodhart’s Law in action , 2018, GigaScience.

[25]  Andrea Giovanni Nuzzolese,et al.  Semantic Web Conference Ontology - A Refactoring Solution , 2016, ESWC.

[26]  Martin P. Brändle,et al.  Citation analysis with microsoft academic , 2016, Scientometrics.

[27]  Heiko Paulheim,et al.  RDF2Vec: RDF Graph Embeddings for Data Mining , 2016, SEMWEB.