PG-Keys: Keys for Property Graphs

We report on a community effort between industry and academia to shape the future of property graph constraints. The standardization for a property graph query language is currently underway through the ISO Graph Query Language (GQL) project. Our position is that this project should pay close attention to schemas and constraints, and should focus next on key constraints. The main purposes of keys are enforcing data integrity and allowing the referencing and identifying of objects. Motivated by use cases from our industry partners, we argue that key constraints should be able to have different modes, which are combinations of basic restriction that require the key to be exclusive, mandatory, and singleton. Moreover, keys should be applicable to nodes, edges, and properties since these all can represent valid real-life entities. Our result is PG-Keys, a flexible and powerful framework for defining key constraints, which fulfills the above goals. PG-Keys is a design by the Linked Data Benchmark Council's Property Graph Schema Working Group, consisting of members from industry, academia, and ISO GQL standards group, intending to bring the best of all worlds to property graph practitioners. PG-Keys aims to guide the evolution of the standardization efforts towards making systems more useful, powerful, and expressive.

[1]  Wenfei Fan,et al.  Keys for XML , 2001, WWW '01.

[2]  Leonid Libkin,et al.  Incremental maintenance of views with duplicates , 1995, SIGMOD '95.

[3]  Terry Halpin,et al.  Object-Role Modeling Fundamentals: A Practical Guide to Data Modeling with ORM , 2015 .

[4]  Ahmed K. Elmagarmid,et al.  Duplicate Record Detection: A Survey , 2007, IEEE Transactions on Knowledge and Data Engineering.

[5]  Vasilis Efthymiou,et al.  Entity resolution in the web of data , 2013, Entity Resolution in the Web of Data.

[6]  Doina Caragea,et al.  Graph Databases , 2019, Encyclopedia of Big Data Technologies.

[7]  Georg Lausen,et al.  Relational Databases in RDF: Keys and Foreign Keys , 2008, SWDB-ODBIS.

[8]  Ognjen Savkovic,et al.  Semantics and Validation of Recursive SHACL , 2018, SEMWEB.

[9]  Ping Lu,et al.  Dependencies for Graphs , 2017, PODS.

[10]  S. Shen-Orr,et al.  Network motifs: simple building blocks of complex networks. , 2002, Science.

[11]  Stefan Plantikow,et al.  Cypher: An Evolving Query Language for Property Graphs , 2018, SIGMOD Conference.

[12]  David Hyland-Wood,et al.  RDF 1.1: Knowledge Representation and Data Integration Language for the Web , 2020, Symmetry.

[13]  Hassan Chafi,et al.  The LDBC Social Network Benchmark: Interactive Workload , 2015, SIGMOD Conference.

[14]  George H. L. Fletcher,et al.  Querying Graphs , 2018, Querying Graphs.

[15]  Angela Bonifati,et al.  Graph Queries , 2019, SIGMOD Rec..

[16]  Wim Martens,et al.  Navigating the Maze of Wikidata Query Logs , 2019, WWW.

[17]  Marcelo Arenas,et al.  Foundations of Modern Query Languages for Graph Databases , 2016, ACM Comput. Surv..

[18]  Wim Martens,et al.  An Analytical Study of Large SPARQL Query Logs , 2017, Proc. VLDB Endow..

[19]  Juan Sequeda,et al.  G-CORE: A Core for Future Graph Query Languages , 2017, SIGMOD Conference.

[20]  Claudio Gutierrez,et al.  Survey of graph database models , 2008, CSUR.

[21]  Leonid Libkin,et al.  Coping with Incomplete Data: Recent Advances , 2020, PODS.

[22]  Limsoon Wong,et al.  Incremental recomputation in local languages , 2003, Inf. Comput..

[23]  Chao Tian,et al.  Keys for Graphs , 2015, Proc. VLDB Endow..

[24]  V. S. Subrahmanian,et al.  Maintaining views incrementally , 1993, SIGMOD Conference.

[25]  Yavor Nenov,et al.  Maintenance of datalog materialisations revisited , 2019, Artif. Intell..

[26]  Wenfei Fan,et al.  Constraints for semistructured data and XML , 2001, SGMD.

[27]  C. J. Date A Guide to the SQL Standard , 1987 .

[28]  Wenfei Fan,et al.  Reasoning about Keys for XML , 2001, DBPL.

[29]  Gio Wiederhold,et al.  Incremental Recomputation of Active Relational Expressions , 1991, IEEE Trans. Knowl. Data Eng..

[30]  Claudio Gutierrez,et al.  Knowledge Graphs: A Tutorial on the History of Knowledge Graph's Main Ideas , 2020, CIKM.

[31]  Jaroslav Pokorný,et al.  Integrity constraints in graph databases , 2017, ANT/SEIT.

[32]  Juan Sequeda,et al.  On Directly Mapping Relational Databases to Property Graphs , 2019, AMW.

[33]  Serge Abiteboul,et al.  Foundations of Databases , 1994 .

[34]  Kristin Decker,et al.  Uml Distilled A Brief Guide To The Standard Object Modeling Language , 2016 .

[35]  Laurian M. Chirica,et al.  The entity-relationship model: toward a unified view of data , 1975, SIGF.

[36]  Yuhang Xia,et al.  Property Graph Database Modeling and Application of Electronic Medical Record , 2018, 2018 Eighth International Conference on Instrumentation & Measurement, Computer, Communication and Control (IMCCC).

[37]  Jimmy J. Lin,et al.  Do We Need Specialized Graph Databases?: Benchmarking Real-Time Social Networking Applications , 2017, GRADES@SIGMOD/PODS.

[38]  Martin Gogolla,et al.  Identifying Objects by Declarative Queries , 2000, Advances in Object-Oriented Data Modeling.

[39]  Terry A. Halpin,et al.  Modeling of Reference Schemes , 2013, BMMDS/EMMSAD.

[40]  Iovka Boneva,et al.  Complexity and Expressiveness of ShEx for RDF , 2015, ICDT.

[41]  W. Martens,et al.  A Trichotomy for Regular Trail Queries , 2019, STACS.

[42]  Emilio Jesús Gallego Arias,et al.  Certified Graph View Maintenance with Regular Datalog , 2018, Theory and Practice of Logic Programming.

[43]  Domagoj Vrgoc,et al.  Querying Graphs with Data , 2016, J. ACM.

[44]  Thomas Schwentick,et al.  BonXai: Combining the simplicity of DTD with the expressiveness of XML Schema , 2015, PODS.

[45]  Christopher J. Rawlings,et al.  Representing and querying disease networks using graph databases , 2016, BioData Mining.

[46]  Ian Horrocks,et al.  OWL Web Ontology Language Reference-W3C Recommen-dation , 2004 .

[47]  Kendall Scott,et al.  UML distilled - a brief guide to the Standard Object Modeling Language (2. ed.) , 2000, notThenot Addison-Wesley object technology series.

[48]  Norman W. Paton,et al.  Identification of Database Objects by Key , 1988, OODBS.

[49]  Wim Martens,et al.  Dichotomies for Evaluating Simple Regular Path Queries , 2019, ACM Trans. Database Syst..

[50]  Stefan Plantikow,et al.  Updating Graph Databases with Cypher , 2019, Proc. VLDB Endow..

[51]  VassilisChristophides,et al.  Entity Resolution in the Web of Data , 2015 .

[52]  Dominik Tomaszuk,et al.  RDF Validation: A Brief Survey , 2017, BDAS.

[53]  Alvaro Cortés-Calabuig,et al.  Constraints in RDF , 2010, SDKB.

[54]  Sven Hartmann,et al.  Expressive, yet tractable XML keys , 2009, EDBT '09.

[55]  M. Tamer Özsu,et al.  Regular Path Query Evaluation on Streaming Graphs , 2020, SIGMOD Conference.

[56]  C. M. Sperberg-McQueen,et al.  W3C XML Schema Definition Language (XSD) 1.1 Part 1: Structures , 2012 .

[57]  Klaus-Dieter Schewe,et al.  Conceptual Modelling and Its Theoretical Foundations , 2012, Lecture Notes in Computer Science.

[58]  Sophia Ananiadou,et al.  biochem4j: Integrated and extensible biochemical knowledge through graph databases , 2017, PloS one.

[59]  Alin Deutsch,et al.  Aggregation Support for Modern Graph Analytics in TigerGraph , 2020, SIGMOD Conference.

[60]  Zuopeng Justin Zhang,et al.  Graph Databases for Knowledge Management , 2017, IT Professional.

[61]  Antonino Fiannaca,et al.  BioGraph: a web application and a graph database for querying and analyzing bioinformatics resources , 2018, BMC Systems Biology.

[62]  Leonid Libkin,et al.  Propositional and Predicate Logics of Incomplete Information , 2021, KR.

[63]  Ioana Manolescu,et al.  Algebraic incremental maintenance of XML views , 2013, TODS.