Graph data management systems for new application domains

Graph data management has long been a topic of interest for database researchers. The topic gained renewed interest recently, motivated by the rapid emergence of new application domains including social networks and the Web of data. This tutorial characterizes graph data management techniques and categorizes recent graph data management systems. In this context, we focus on the management of very large graphs such as social networks or the Web of data, rather than on the management of many smaller graphs (which frequently appear in bioinformatics and cheminformatics). The first part of this tutorial describes the requirements imposed by new application domains, and provides a classification of recent systems according to their data and computation models. Our classification also highlights the main representations used to store the graph (dense/sparse native graphs, triple storage or relational layouts), and the access patterns and typical queries considered (reachability or neighborhood queries, updates versus reads, transactional requirements and graph consistency models). In the second part of this tutorial, we map the data and computation models to concrete graph management systems, highlighting target application domains, implementation techniques, scalability and workload requirements. We pay special attentions to declarative models that allow query optimization (as performed in Horton [1] and Neo4j [2]), and contrast them to procedural models (such as Pregel [3]), which are more general but severely limit optimizations.