Devising a Cloud Scientific Workflow Platform for Big Data

Scientific workflow management systems (SWFMSs) are facing unprecedented challenges from big data deluge. As revising all the existing workflow applications to fit into Cloud computing paradigm is impractical, thus migrating SWFMSs into the Cloud to leverage the functionalities of both Cloud computing and SWFMSs may provide a viable approach to big data processing. In this paper, we first discuss the challenges for scientific workflow applications and the available solutions in details, and analyze the essential requirements for a scientific computing Cloud platform. Then we propose a service framework to normalize the integration of SWFMS with Cloud computing. Meanwhile, we also present our implementation experience based on the service Framework. At last, we set up a series of experiments to demonstrate the capability of our implementation and use a Montage Image Mosaic Workflow as a showcase of the implementation.

[1]  Miklós Kozlovszky,et al.  Enabling Generic Distributed Computing Infrastructure Compatibility for Workflow Management Systems , 2012, Comput. Sci..

[2]  Renato Figueiredo,et al.  Science Clouds: Early Experiences in Cloud Computing for Scientific Applications , 2008 .

[3]  Jia Zhang,et al.  Bridging VisTrails Scientific Workflow Management System to High Performance Computing , 2013, 2013 IEEE Ninth World Congress on Services.

[4]  Daniel S. Katz,et al.  Pegasus: A framework for mapping complex scientific workflows onto distributed systems , 2005, Sci. Program..

[5]  Ewa Deelman,et al.  Wrangler: virtual cluster provisioning for the cloud , 2011, HPDC '11.

[6]  Katarzyna Keahey,et al.  Contextualization: Providing One-Click Virtual Clusters , 2008, 2008 IEEE Fourth International Conference on eScience.

[7]  Suresh Marru,et al.  The LEAD Portal: a TeraGrid gateway and application service architecture , 2007, Concurr. Comput. Pract. Exp..

[8]  Yong Zhao,et al.  Cloud Computing and Grid Computing 360-Degree Compared , 2008, GCE 2008.

[9]  Chonho Lee,et al.  Workflow framework to support data analytics in cloud computing , 2012, 4th IEEE International Conference on Cloud Computing Technology and Science Proceedings.

[10]  Milind A. Bhandarkar,et al.  MapReduce programming with apache Hadoop , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[11]  Ewa Deelman,et al.  Experiences using cloud computing for a scientific workflow application , 2011, ScienceCloud '11.

[12]  Jing Hua,et al.  A Reference Architecture for Scientific Workflow Management Systems and the VIEW SOA Solution , 2009, IEEE Transactions on Services Computing.

[13]  Yong Zhao,et al.  Falkon: a Fast and Light-weight tasK executiON framework , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[14]  Jianting Zhang,et al.  Ontology-Driven Composition and Validation of Scientific Grid Workflows in Kepler: a Case Study of Hyperspectral Image Processing , 2006, 2006 Fifth International Conference on Grid and Cooperative Computing Workshops.

[15]  John M. Dennis,et al.  Parallel high-resolution climate data analysis using swift , 2011, MTAGS '11.

[16]  Quan Z. Sheng,et al.  Science in the Cloud: Allocation and Execution of Data-Intensive Scientific Workflows , 2013, Journal of Grid Computing.

[17]  Gordon Bell,et al.  Beyond the Data Deluge , 2009, Science.

[18]  Jing Hua,et al.  Service-Oriented Architecture for VIEW: A Visual Scientific Workflow Management System , 2008, 2008 IEEE International Conference on Services Computing.

[19]  Zhao Zhang,et al.  Parallel Scripting for Applications at the Petascale and Beyond , 2009, Computer.

[20]  Long Wang,et al.  An Iterative Optimization Framework for Adaptive Workflow Management in Computational Clouds , 2013, 2013 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications.

[21]  J. Tao,et al.  A broker-based framework for multi-cloud workflows , 2013, MultiCloud '13.

[22]  Carole A. Goble,et al.  Taverna: a tool for building and running workflows of services , 2006, Nucleic Acids Res..

[23]  Suresh Marru,et al.  The LEAD Portal: a TeraGrid gateway and application service architecture: Research Articles , 2007 .

[24]  Edward A. Lee,et al.  Scientific workflow management and the Kepler system , 2006, Concurr. Comput. Pract. Exp..

[25]  Cláudio T. Silva,et al.  Managing Rapidly-Evolving Scientific Workflows , 2006, IPAW.

[26]  G. Bruce Berriman,et al.  On the Use of Cloud Computing for Scientific Workflows , 2008, 2008 IEEE Fourth International Conference on eScience.

[27]  Kasikrit Damkliang,et al.  Taverna Workflow and Supporting Service for Single Nucleotide Polymorphisms Analysis , 2009, 2009 International Conference on Information Management and Engineering.

[28]  Maliha Aziz,et al.  Resource descriptions, ontology, and resource discovery , 2010, Int. J. Metadata Semant. Ontologies.

[29]  강승택 2006 IEEE International Symposium on EMC를 다녀와서 , 2006 .

[30]  Gregor von Laszewski,et al.  Swift: Fast, Reliable, Loosely Coupled Parallel Computation , 2007, 2007 IEEE Congress on Services (Services 2007).

[31]  Ewa Deelman,et al.  Scientific Workflows in the Cloud , 2011 .