论文信息 - Towards a next generation ofscientific c omputing in the Cloud

Towards a next generation ofscientific c omputing in the Cloud

More than ever, designing new types of highly scalable data intensive computing is needed to qualify the new generation of scientific computing and analytics effectively perform complex tasks on massive amounts of data such as clustering, matrix computation, data mining, information extraction ... etc. MapReduce, put forward by Google, is a well-known model for programming commodity computer clusters to perform largescale data processing in a single pass. Hadoop is the most popular open-source implementation of the MapReduce model which provides a simple abstraction for large-scale distributed algorithm; it has become a popular distributed computing and data analysis paradigm in recent years. While, Hadoop MapReduce suits well for embarrassingly parallel problems, it suffers significant troubles when dealing with iterative algorithms; as a consequence, many alternative frameworks that support this class of algorithms were created. In this paper, we propose architecture for such configuration implemented in an SPC (Scientific Private Cloud) prototype, using the Hadoop 2.0 next generation platform to allow the use of alternative programming frameworks respecting a hybrid approach, while retaining the scalability and fault tolerance of Hadoop MapReduce. By adapting scientific problems to execute them in our Scientific Cloud, experiments conducted show the effectiveness of the proposed model and its impact on the ease of frameworks handling.

Abdellatif Medouri | Yassine Tabaa | Yassine Tabaa

[1] Geoffrey C. Fox,et al. Twister: a runtime for iterative MapReduce , 2010, HPDC '10.

[2] Dejan S. Milojicic,et al. Open Cirrus: A Global Cloud Computing Testbed , 2010, Computer.

[3] Randy H. Katz,et al. Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center , 2011, NSDI.

[4] Eero Vainikko,et al. Adapting scientific computing problems to clouds using MapReduce , 2012, Future Gener. Comput. Syst..

[5] Scott Shenker,et al. Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[6] Sohrab Hojjatkhah,et al. A Cloud Computing Collaborative Architecture Model , 2012 .

[7] Benjamin Hindman,et al. A Common Substrate for Cluster Computing , 2009, HotCloud.

[8] Shantenu Jha,et al. Understanding application-level interoperability: Scaling-out MapReduce over high-performance grids and clouds , 2011, Future Gener. Comput. Syst..

[9] Michael D. Ernst,et al. HaLoop , 2010, Proc. VLDB Endow..