Optimizing bioinformatics workflows for data analysis using cloud management techniques

With the rapid development in recent years of high-throughput technologies in the life sciences, huge amounts of data are being generated and stored in databases. Despite significant advances in computing capacity and performance, an analysis of these large-scale data in a search for biomedically relevant patterns remains a challenging task. Scientific workflow applications support data-mining in more complex scenarios that include many data sources and computational tools, as commonly found in bioinformatics. A scientific workflow application is a holistic unit that defines, executes, and manages scientific applications using different software tools. Existing workflow applications are process- or data- rather than resource-oriented. Thus, they lack efficient computational resource management capabilities, such as those provided by Cloud computing environments. Insufficient computational resources disrupt the execution of workflow applications, wasting time and money. To address this issue, advanced resource monitoring and management strategies are required to determine the resource consumption behaviours of workflow applications for a dynamical allocation and deallocation of resources. In this paper, we present a novel Cloud resource monitoring technique and a knowledge management strategy to manage computational resources for workflow applications in order to guarantee their performance goals and their successful completion. We present the design description of these techniques, demonstrate how they can be applied to scientific workflow applications, and present first evaluation results as a proof of concept.

[1]  Wolfgang Kastner,et al.  Applying availability SLAs to traffic management systems , 2011, 2011 14th International IEEE Conference on Intelligent Transportation Systems (ITSC).

[2]  Arun Krishnan,et al.  Wildfire: distributed, Grid-enabled workflow construction and execution , 2004, BMC Bioinformatics.

[3]  Paolo Romano,et al.  Automation of in-silico data analysis processes through workflow management systems , 2007, Briefings Bioinform..

[4]  Bertram Ludäscher,et al.  Kepler: an extensible system for design and execution of scientific workflows , 2004 .

[5]  Carole A. Goble,et al.  Taverna: a tool for building and running workflows of services , 2006, Nucleic Acids Res..

[6]  Michael Zouberakis,et al.  Solutions for data integration in functional genomics: a critical assessment and case study , 2008, Briefings Bioinform..

[7]  Cole Trapnell,et al.  Ultrafast and memory-efficient alignment of short DNA sequences to the human genome , 2009, Genome Biology.

[8]  Fabio Panzieri,et al.  QoS–Aware Clouds , 2010, 2010 IEEE 3rd International Conference on Cloud Computing.

[9]  David E. Culler,et al.  The ganglia distributed monitoring system: design, implementation, and experience , 2004, Parallel Comput..

[10]  George Spanoudakis,et al.  Establishing and Monitoring SLAs in Complex Service Based Systems , 2009, 2009 IEEE International Conference on Web Services.

[11]  Abhishek Tiwari,et al.  Workflow based framework for life science informatics , 2007, Comput. Biol. Chem..

[12]  A. Nekrutenko,et al.  Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences , 2010, Genome Biology.

[13]  Ross S Hall,et al.  A practical, bioinformatic workflow system for large data sets generated by next-generation sequencing , 2010, Nucleic acids research.

[14]  Robert Giegerich,et al.  Conveyor: a worko w engine for bioinformatic analyses , 2011 .

[15]  Carole A. Goble,et al.  Workflow discovery: the problem, a case study from e-Science and a graph-based solution , 2006, 2006 IEEE International Conference on Web Services (ICWS'06).

[16]  D. Hollingsworth The workflow Reference Model , 1994 .

[17]  Richard Wolski,et al.  The Eucalyptus Open-Source Cloud-Computing System , 2009, 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid.

[18]  Loretta C. Johnson,et al.  Empowering 21 st Century Biology , 2010 .

[19]  H. Steven Wiley,et al.  Characterization and improvement of RNA-Seq precision in quantitative transcript expression profiling , 2011, Bioinform..

[20]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[21]  Saurabh Sinha,et al.  Empowering 21st Century Biology , 2010 .

[22]  Elizabeth Pennisi,et al.  Human genome 10th anniversary. Will computers crash genomics? , 2011, Science.

[23]  Schahram Dustdar,et al.  Low level Metrics to High level SLAs - LoM2HiS framework: Bridging the gap between monitored metrics and SLA parameters in cloud environments , 2010, 2010 International Conference on High Performance Computing & Simulation.

[24]  Lutz Schubert,et al.  Towards autonomous SLA management using a proxy-like approach , 2007, Multiagent Grid Syst..

[25]  Lior Pachter,et al.  Sequence Analysis , 2020, Definitions.

[26]  Muli Ben-Yehuda,et al.  The Reservoir model and architecture for open federated cloud computing , 2009, IBM J. Res. Dev..

[27]  Rizos Sakellariou,et al.  Simulating Autonomic SLA Enactment in Clouds Using Case Based Reasoning , 2010, ServiceWave.

[28]  Jeffrey O. Kephart,et al.  The Vision of Autonomic Computing , 2003, Computer.

[29]  Daniel S. Katz,et al.  Pegasus: A framework for mapping complex scientific workflows onto distributed systems , 2005, Sci. Program..

[30]  Brian D Halligan,et al.  Low cost, scalable proteomics data analysis using Amazon's cloud computing services and open source search algorithms. , 2009, Journal of proteome research.

[31]  Schahram Dustdar,et al.  Towards Knowledge Management in Self-Adaptable Clouds , 2010, 2010 6th World Congress on Services.

[32]  Rajkumar Buyya,et al.  Towards autonomic detection of SLA violations in Cloud infrastructures , 2012, Future Gener. Comput. Syst..

[33]  Rizos Sakellariou,et al.  Enacting SLAs in Clouds Using Rules , 2011, Euro-Par.

[34]  Rajkumar Buyya,et al.  Article in Press Future Generation Computer Systems ( ) – Future Generation Computer Systems Cloud Computing and Emerging It Platforms: Vision, Hype, and Reality for Delivering Computing as the 5th Utility , 2022 .

[35]  César A. F. De Rose,et al.  DeSVi : An Architecture for Detecting SLA Violations in Cloud Computing Infrastructures , 2010 .