Efficient updates for a shared nothing analytics platform

In this paper we describe a cloud-based data-warehouselike system especially targeted to time series data. Apart from the benefits that a distributed storage built on top of a shared-nothing architecture offers, our system is designed to efficiently cope with continuous, on-line updates of temporally ordered data without compromising the query throughput. Through a totally customizable process performing asynchronous aggregation of past records, we achieve significant gains in storage and update times compared to traditional methods, maintaining a high accuracy in query responses for our target application. Experiments using our prototype implementation over an actual testbed prove that our scheme considerably accelerates (by a factor above 3) the update procedure and reduces required storage by at least 30%. We also show how these gains are related to the level and rate of aggregation performed.