Jumbo : a data intensive distributed computation platform : design overview and preliminary experiment

In recent years, the volume of data processed by companies and research institutions has grown enormously, with terabytes and petabytes now being normal. This has led to the development of frameworks for distributed processing of such large quantities of data on large clusters of commodity PCs, such as Google’s MapReduce. However, many of these frameworks sacrifice baseline performance for reliability and scalability. In this paper, we introduce Jumbo, a system designed for experimentation with different approaches on large scale data processing, and outline some of the problems it is intended to solve.