Compact Summaries of Rich Heterogeneous Graphs

Large data graphs with complex and heterogeneous structure, possibly featuring typed data and an ontol- ogy encoding the application-domain semantics, are widely used nowadays. The literature provides many solutions for building succinct representations of graphs, called summaries, in particular based on graph quotients through an equivalence relation between graph nodes. We consider efficient and compact summarization of rich heterogeneous graphs, in particular RDF ones, which may feature data edges, typed nodes, and an ontology. First, we devise new graph node equivalence relations, particularly tolerant of structural heterogeneity; they lead to compact yet informative quotient summaries. Second, we show how to extend any node equivalence relation (including, but not limited to ours) to types and ontologies, and provide the first in-depth study of the interplay between quotient sum- marization and RDF graph saturation, which defines the semantics of an RDF graph, in particular in the presence of an ontology. We establish a sufficient condition on a node equivalence relation, which if met allows an efficient method, called shortcut, for summarizing RDF graphs. We describe novel, efficient, incremental algorithms for summarizing graphs with our node equivalence relations, and experiments validating their performance.