Exploring the Use of Elastic Resource Federations for Enabling Large-Scale Scientific Workflows

An important class of scientific and engineering workflows, e.g. those used for uncertainty quantification, design optimization and parametric studies, naturally map onto the Many-Task Computing (MTC) paradigm. However, what distinguishes these workloads is a unique combination of dynamically changing resource requirements and very large computational and throughput demands. Such workflows can benefit from an elastic execution infrastructure that is based on the dynamic federation of resources. The overarching goal of this paper is to explore the nature of such an elastic, dynamically federated platform, and to experimentally demonstrate that it can effectively support the targeted class of scientific and engineering workflows. As a driving application for our study we use the problem of constructing a phase diagram in microfluidics, which is representative for a broader class of parameter space interrogation techniques. To satisfy its computational demands of 2.5 million corehours within reasonable time limits, we construct a dynamic federation of ten HPC resources from six different computing centers. This experiment delivers the most comprehensive data on fluid flow in a microchannel with an obstacle. Moreover, it offers important insights that enable us to identify key requirements and architectural components that a platform based on federated resources must provide in order to efficiently handle considered scientific MTC workloads.

[1]  Manish Parashar,et al.  Cloud Paradigms and Practices for Computational and Data-Enabled Science and Engineering , 2013, Computing in Science & Engineering.

[2]  Alexandru Iosup,et al.  Performance Analysis of Cloud Computing Services for Many-Tasks Scientific Computing , 2011, IEEE Transactions on Parallel and Distributed Systems.

[3]  Baskar Ganapathysubramanian,et al.  Engineering fluid flow using sequenced microstructures , 2013, Nature Communications.

[4]  Miron Livny,et al.  The cost of doing science on the cloud: The Montage example , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[5]  Edward Walker Continuous adaptation for high performance throughput computing across distributed clusters , 2008, 2008 IEEE International Conference on Cluster Computing.

[6]  Antonio Puliafito,et al.  How to Enhance Cloud Architectures to Enable Cross-Federation , 2010, IEEE CLOUD.

[7]  Renato Figueiredo,et al.  Science Clouds: Early Experiences in Cloud Computing for Scientific Applications , 2008 .

[8]  Daniel S. Katz,et al.  Many-Task Computing and Blue Waters , 2012, ArXiv.

[9]  Rajkumar Buyya,et al.  InterGrid: a case for internetworking islands of Grids , 2008, Concurr. Comput. Pract. Exp..

[10]  Howon Lee,et al.  Colour-barcoded magnetic microparticles for multiplexed bioassays. , 2010, Nature materials.

[11]  Eduardo Huedo,et al.  Evaluation of a Utility Computing Model Based on the Federation of Grid Infrastructures , 2007, Euro-Par.

[12]  Liana L. Fong,et al.  Cloud federation in a layered service model , 2012, J. Comput. Syst. Sci..

[13]  Liana L. Fong,et al.  Enabling Interoperability among Meta-Schedulers , 2008, 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid (CCGRID).

[14]  Daniel S. Katz,et al.  Computational Science, Infrastructure and Interdisciplinary Research on University Campuses: Experie , 2009 .

[15]  Y. Zhan,et al.  Vortex-assisted DNA delivery. , 2010, Lab on a chip.

[16]  Yong Zhao,et al.  Many-task computing for grids and supercomputers , 2008, 2008 Workshop on Many-Task Computing on Grids and Supercomputers.

[17]  Alex Groisman,et al.  Visualizing a one-way protein encounter complex by ultrafast single-molecule mixing , 2011, Nature Methods.

[18]  Zhen Li,et al.  A computational infrastructure for grid-based asynchronous parallel applications , 2007, HPDC '07.

[19]  Dennis Gannon,et al.  Cloud Programming Paradigms for Technical Computing Applications , 2012 .

[20]  Dick H. J. Epema,et al.  KOALA: a co‐allocating grid scheduler , 2008, Concurr. Comput. Pract. Exp..

[21]  Liana L. Fong,et al.  Enabling Interoperability among Grid Meta-Schedulers , 2013, Journal of Grid Computing.

[22]  Ivan Rodero,et al.  BPDL: A Data Model for Grid Resource Broker Capabilities , 2007 .

[23]  Mats Rynge,et al.  Supporting Shared Resource Usage for a Diverse User Community: the OSG Experience and Lessons Learned , 2012 .

[24]  Rajkumar Buyya,et al.  InterGrid: a case for internetworking islands of Grids , 2008 .

[25]  José A. B. Fortes,et al.  Large-Scale Cloud Computing Research: Sky Computing on FutureGrid and Grid'5000 , 2010, ERCIM News.

[26]  Francine Berman,et al.  Overview of the Book: Grid Computing – Making the Global Infrastructure a Reality , 2003 .

[27]  Ivan Rodero,et al.  The Grid Backfilling: a Multi-Site Scheduling Architecture with Data Mining Prediction Techniques , 2008 .

[28]  Manish Parashar,et al.  Special Issue on Grid Computing , 2005, Proc. IEEE.