Exploring Large Scholarly Networks with Hermes

Every year, the number of scientific publications increases, adding complexity to the networks of collaborations, citations, and topics, in which papers are embedded. Analyzing these networks with efficient tools is important to help researchers identify relevant works and understand scientific impact. However, available tools face several limitations, indicating that there is still room for improvement. We present Hermes, a prototype for exploring large and heterogeneous scholarly networks. Hermes allows users to seamlessly navigate diverse types of networks within a single graph, spanning hundreds of millions of nodes and relationships. Our prototype achieves reasonable responsiveness on commodity hardware through: a) comprehensive indexing, b) a careful coupling of a graph database and a search engine, and c) incremental processing of temporal queries. In this demonstration, we explain the techniques we adopt and illustrate how to use Hermes for exploring theMicrosoft Academic Graph.

[1]  Mathieu Bastian,et al.  Gephi: An Open Source Software for Exploring and Manipulating Networks , 2009, ICWSM.

[2]  Feng Xia,et al.  Big Scholarly Data: A Survey , 2017, IEEE Transactions on Big Data.

[3]  Lise Getoor,et al.  FutureRank: Ranking Scientific Articles by Predicting their Future PageRank , 2009, SDM.

[4]  Jie Tang,et al.  ArnetMiner: extraction and mining of academic social networks , 2008, KDD.

[5]  E. F. CODD,et al.  A relational model of data for large shared data banks , 1970, CACM.

[6]  Vladimir Batagelj,et al.  Pajek - Program for Large Network Analysis , 1999 .

[7]  Wenguang Chen,et al.  ImmortalGraph: A System for Storage and Analysis of Temporal Graphs , 2015, TOS.

[8]  Gabriel Campero Durand Best Practices for Developing Graph Database Applications : A Case Study Using Apache Titan , 2017 .

[9]  Julia Stoyanovich,et al.  Towards a Distributed Infrastructure for Evolving Graph Analytics , 2016, WWW.

[10]  E. F. Codd,et al.  A relational model of data for large shared data banks , 1970, CACM.

[11]  Gobinda G. Chowdhury,et al.  A review of the status of 20 digital libraries , 2000, J. Inf. Sci..

[12]  Yang Song,et al.  An Overview of Microsoft Academic Service (MAS) and Applications , 2015, WWW.

[13]  Peter A. Boncz LDBC: benchmarks for graph and RDF data management , 2013, IDEAS '13.

[14]  Gunter Saake,et al.  Backlogs and Interval Timestamps: Building Blocks for Supporting Temporal Queries in Graph Databases , 2017, EDBT/ICDT Workshops.

[15]  Jacob Krüger,et al.  Identifying Innovative Documents: Quo vadis? , 2017, ICEIS.

[16]  Stephan Günnemann,et al.  Automatic Algorithm Transformation for Efficient Multi-Snapshot Analytics on Temporal Graphs , 2017, Proc. VLDB Endow..

[17]  Gábor Szárnyas,et al.  Incremental View Maintenance for Property Graph Queries , 2017, SIGMOD Conference.

[18]  Gábor Csárdi,et al.  The igraph software package for complex network research , 2006 .

[19]  Evaggelia Pitoura,et al.  Historical Traversals in Native Graph Databases , 2017, ADBIS.

[20]  Chaomei Chen,et al.  CiteSpace II: Detecting and visualizing emerging trends and transient patterns in scientific literature , 2006, J. Assoc. Inf. Sci. Technol..

[21]  Ying Ding,et al.  Scholarly Networks Analysis , 2014, Encyclopedia of Social Network Analysis and Mining.

[22]  Yuxiao Dong,et al.  A Century of Science: Globalization of Scientific Collaborations, Citations, and Innovations , 2017, KDD.

[23]  Jeffrey C. Carver,et al.  Identification of SLR tool needs - results of a community workshop , 2016, Inf. Softw. Technol..

[24]  Aric Hagberg,et al.  Exploring Network Structure, Dynamics, and Function using NetworkX , 2008, Proceedings of the Python in Science Conference.