Towards a next generation ofscientific c omputing in the Cloud

More than ever, designing new types of highly scalable data intensive computing is needed to qualify the new generation of scientific computing and analytics effectively perform complex tasks on massive amounts of data such as clustering, matrix computation, data mining, information extraction ... etc. MapReduce, put forward by Google, is a well-known model for programming commodity computer clusters to perform largescale data processing in a single pass. Hadoop is the most popular open-source implementation of the MapReduce model which provides a simple abstraction for large-scale distributed algorithm; it has become a popular distributed computing and data analysis paradigm in recent years. While, Hadoop MapReduce suits well for embarrassingly parallel problems, it suffers significant troubles when dealing with iterative algorithms; as a consequence, many alternative frameworks that support this class of algorithms were created. In this paper, we propose architecture for such configuration implemented in an SPC (Scientific Private Cloud) prototype, using the Hadoop 2.0 next generation platform to allow the use of alternative programming frameworks respecting a hybrid approach, while retaining the scalability and fault tolerance of Hadoop MapReduce. By adapting scientific problems to execute them in our Scientific Cloud, experiments conducted show the effectiveness of the proposed model and its impact on the ease of frameworks handling.