Gathering Entropy at Large Scale with HAVEGE and BlobSeer

Large sequences of random information are the foundation for a large class of applications: security, online gambling games, large scale Monte-Carlo simulations, etc. Many such applications are distributed and run on large-scale infrastructures such as clouds and grids. In this context, the random generator plays a crucial role: it needs to achieve a high entropy, a high throughput and last but not least a high degree of security. Several ways to generate high-entropy random information securely exist. For example, HAVEGE generates random information by gathering entropy from internal processor states of the machine where it is running alongside the user applications. These internal states are inheritably volatile and impossible to tamper with in a controlled fashion by the applications running on it. A centralized approach however does not scale to the high throughput requirement in a large scale setting. In order to do so, the output of several such instances needs to be combined into a single output stream. While this certainly has a good potential to solve the high throughput requirement, the way the outputs of the instances are combined in a single stream becomes a new weak link that can negatively impact all three requirements and therefore has to be addressed properly. In this paper we propose a distributed random number generator that efficiently addresses the aforementioned issue. We introduce a series of mechanisms to preserve a high entropy and degree of security for the combined output result and implement them on top of BlobSeer, a data storage service specifically designed to offer a high throughput in large-scale deployments even under heavy access concurrency. Large-scale experiments were performed on the G5K testbed and demonstrate substantial benefits for our approach.