Pregel: a system for large-scale graph processing - "ABSTRACT"

Many practical computing problems concern large graphs. Standard examples include the Web graph and various social networks. The scale of these graphs—in some cases billions of vertices, trillions of edges—poses challenges to their efficient processing. Despite the ubiquity of large graphs and their commercial importance, we know of no scalable general-purpose system for implementing graph algorithms in a distributed environment. To address distributed processing of real-life graphs, we defined a model of computation and realized it through a scalable and fault-tolerant system called Pregel, with an expressive and flexible API. The high-level organization of Pregel programs is inspired by Valiant’s Bulk Synchronous Parallel model. Pregel computations consist of a sequence of iterations, called superstep s. During a superstep the framework invokes a userdefined Compute() function for each vertex, conceptually in parallel. The function specifies behavior at a single vertex v and a single superstep S. It can read messages sent to v in superstep S − 1, send messages to other vertices that will be received at superstep S + 1, and modify the state of v and its outgoing edges. Messages are typically sent along outgoing edges, but a message may be sent to any vertex whose identifier is known. A program terminates when all vertices declare that they are done. The input and output are both directed graphs. They are often but not always isomorphic, because vertices and edges can be added and removed during computation. Userdefined handlers are applied to resolve conflicts for concurrent mutations. The vertex-centric flavor of programming in Pregel is similar to the MapReduce model in that programmers focus on a local action, processing a single item at a time, which the system then lifts to computation on a large dataset. The synchronicity of the Pregel model simplifies writing correct programs and simplifies reasoning about the inter-