论文信息 - Jumbo : a data intensive distributed computation platform : design overview and preliminary experiment

Jumbo : a data intensive distributed computation platform : design overview and preliminary experiment

In recent years, the volume of data processed by companies and research institutions has grown enormously, with terabytes and petabytes now being normal. This has led to the development of frameworks for distributed processing of such large quantities of data on large clusters of commodity PCs, such as Google’s MapReduce. However, many of these frameworks sacrifice baseline performance for reliability and scalability. In this paper, we introduce Jumbo, a system designed for experimentation with different approaches on large scale data processing, and outline some of the problems it is intended to solve.

Masaru Kitsuregawa | 合田和生 | 優喜連川

[1] Howard Gobioff,et al. The Google file system , 2003, SOSP '03.

[2] Yuan Yu,et al. Dryad: distributed data-parallel programs from sequential building blocks , 2007, EuroSys '07.

[3] Sanjay Ghemawat,et al. MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.