Machine Learning within a Graph Database: A Case Study on Link Prediction for Scholarly Data

In the combination of data management and ML tools, a common problem is that ML frameworks might require moving the data outside of their traditional storage (i.e. databases), for model building. In such scenarios, it could be more effective to adopt some in-database statistical functionalities (Cohen et al., 2009). Such functionalities have received attention for relational databases, but unfortunately for graph-based database systems there are insufficient studies to guide users, either by clarifying the roles of the database or the pain points that require attention. In this paper we make an early feasibility consideration of such processing for a graph domain, prototyping on a state-of-the-art graph database (Neo4j) an in-database ML-driven case study on link prediction. We identify a general series of steps and a common-sense approach for database support. We find limited differences in most steps for the processing setups, suggesting a need for further evaluation. We identify bulk feature calculation as the most time consuming task, at both the model building and inference stages, and hence we define it as a focus area for improving how graph databases support ML workloads.

[1]  Eduard Hovy,et al.  Decompressing Knowledge Graph Representations for Link Prediction , 2019, ArXiv.

[2]  Alneu de Andrade Lopes,et al.  Link prediction in graph construction for supervised and semi-supervised learning , 2015, 2015 International Joint Conference on Neural Networks (IJCNN).

[3]  Vincent Lepetit,et al.  Supervised Feature Learning for Curvilinear Structure Segmentation , 2013, MICCAI.

[4]  Ravneet Kaur,et al.  A survey of data mining and social network analysis based anomaly detection techniques , 2016 .

[5]  Jun Yang,et al.  Data Management in Machine Learning: Challenges, Techniques, and Systems , 2017, SIGMOD Conference.

[6]  Dan Olteanu,et al.  Learning Models over Relational Data: A Brief Tutorial , 2019, SUM.

[7]  Wenwu Zhu,et al.  Structural Deep Network Embedding , 2016, KDD.

[8]  Joseph M. Hellerstein,et al.  MAD Skills: New Analysis Practices for Big Data , 2009, Proc. VLDB Endow..

[9]  Pasquale Minervini,et al.  Convolutional 2D Knowledge Graph Embeddings , 2017, AAAI.

[10]  Palash Goyal,et al.  Graph Embedding Techniques, Applications, and Performance: A Survey , 2017, Knowl. Based Syst..

[11]  Henry N. Adorna,et al.  Link Prediction in a Modified Heterogeneous Bibliographic Network , 2012, 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining.

[12]  Toktam A. Oghaz,et al.  Review on Graph Feature Learning and Feature Extraction Techniques for Link Prediction , 2019, ArXiv.

[13]  Andriy Burkov,et al.  The Hundred-Page Machine Learning Book , 2019 .

[14]  Chintan Bhatt,et al.  Enhance Link Prediction in Online Social Networks Using Similarity Metrics, Sampling, and Classification , 2018 .

[15]  Mohammad Al Hasan,et al.  Link prediction using supervised learning , 2006 .

[16]  Sourabh Vartak,et al.  A Survey on Link Prediction , 2008 .

[17]  Owen Rambow,et al.  Social Network Analysis of Alice in Wonderland , 2012, CLfL@NAACL-HLT.

[18]  P. Talukdar,et al.  InteractE: Improving Convolution-based Knowledge Graph Embeddings by Increasing Feature Interactions , 2019, AAAI.

[19]  Ruijiang Li,et al.  Relation Embedding with Dihedral Group in Knowledge Graph , 2019, ACL.

[20]  Charu C. Aggarwal,et al.  A Survey of Signed Network Mining in Social Media , 2015, ACM Comput. Surv..

[21]  Jacob Eisenstein,et al.  Predicting Semantic Relations using Global Graph Properties , 2018, EMNLP.

[22]  Timothy M. Hospedales,et al.  Hypernetwork Knowledge Graph Embeddings , 2018, ICANN.

[23]  Yan Xu,et al.  Link Prediction in Microblog Network Using Supervised Learning with Multiple Features , 2016, J. Comput..

[24]  Yoshihiro Yamanishi,et al.  propagation: A fast semisupervised learning algorithm for link prediction , 2009 .

[25]  Haining Wang,et al.  Detecting Social Spam Campaigns on Twitter , 2012, ACNS.

[26]  Yang Song,et al.  An Overview of Microsoft Academic Service (MAS) and Applications , 2015, WWW.

[27]  Kevin W. Boyack,et al.  OpenOrd: an open-source toolbox for large graph layout , 2011, Electronic Imaging.