Compressing and Maintaining Statistics Information about Resource Occurrences in a Distributed RDF Store

In distributed RDF stores triples are assigned to one or several storage and compute nodes. In order to perform query planning and optimization, statistical information about the occurrences of IRIs and literals on the individual storage and compute nodes is needed. In this paper, we present our novel compressed storage format for statistical information that can be updated with a single read and write operation if resources occur on few storage and compute nodes only. In our experiments this novel storage format reduced the time to collect statistical information by up to 97% and the required space by up to 99%.