ACOsched: A scheduling algorithm in a federated cloud infrastructure for bioinformatics applications

Task scheduling in a federated cloud environment is a complex problem since there are several cloud providers presenting distinct memory and storage capacities that should be addressed. This article focus on the task scheduling problem in BioNimbuZ, a federated cloud infrastructure for executing bioinformatics applications, which was previously proposed by our group. We present a scheduling algorithm based on Load Balancing Ant Colony (LBACO), called ACOsched, to perform efficient distribution of tasks by finding the best cloud in the federation to execute these tasks. We developed experiments using real biological data, executing the Bowtie mapping tool on one instance of BioNimbuZ, composed by two cloud providers, Amazon EC2 and a bioinformatics laboratory at the University of Brasilia/Brazil. The obtained results show that ACOsched led to a significant improvement in the makespan time of Bowtie executing in BioNimbuZ, when compared to the simple round robin algorithm called DynamicAHP, previously developed in this federated cloud infrastrucutre.

[1]  A. Mamat,et al.  Sharing-aware intercloud scheduler for data-intensive jobs , 2012, 2012 International Conference on Cloud Computing Technologies, Applications and Management (ICCCTAM).

[2]  T. Saaty How to Make a Decision: The Analytic Hierarchy Process , 1990 .

[3]  Nik Bessis,et al.  Towards Inter-cloud Schedulers: A Survey of Meta-scheduling Approaches , 2011, 2011 International Conference on P2P, Parallel, Grid, Cloud and Internet Computing.

[4]  Michael C. Schatz,et al.  CloudBurst: highly sensitive read mapping with MapReduce , 2009, Bioinform..

[5]  Yee-Ming Chen,et al.  An adaptive rescheduling scheme based heuristic algorithm for cloud services applications , 2011, 2011 International Conference on Machine Learning and Cybernetics.

[6]  Cole Trapnell,et al.  Ultrafast and memory-efficient alignment of short DNA sequences to the human genome , 2009, Genome Biology.

[7]  David R. Riley,et al.  CloVR: A virtual machine for automated and portable sequence analysis from the desktop using cloud computing , 2011, BMC Bioinformatics.

[8]  J. Jeffry Howbert,et al.  MR-Tandem: parallel X!Tandem using Hadoop MapReduce on Amazon Web Services , 2012, Bioinform..

[9]  Antonio Puliafito,et al.  How to Enhance Cloud Architectures to Enable Cross-Federation , 2010, IEEE CLOUD.

[10]  Renfa Li,et al.  A priority constrained scheduling strategy of multiple workflows for cloud computing , 2012, 2012 14th International Conference on Advanced Communication Technology (ICACT).

[11]  Jin Soo Lee,et al.  FX: an RNA-Seq analysis tool on the cloud , 2012, Bioinform..

[12]  Francisco Azuaje,et al.  Gene set analysis in the cloud , 2012 .

[13]  Hewlett-Packard Croatia A Novel Scheduling Approach of E-learning Content on Cloud Computing Infrastructure , 2011 .

[14]  Dan Wang,et al.  Cloud Task Scheduling Based on Load Balancing Ant Colony Optimization , 2011, 2011 Sixth Annual Chinagrid Conference.

[15]  Maristela Holanda,et al.  Towards a Hybrid Federated Cloud Platform to Efficiently Execute Bioinformatics Workflows , 2012 .

[16]  Samuel V. Angiuoli,et al.  Resources and Costs for Microbial Sequence Analysis Evaluated Using Virtual Machines and Cloud Computing , 2011, PloS one.

[17]  Yi Peng,et al.  The analytic hierarchy process: task scheduling and resource allocation in cloud computing environment , 2011, The Journal of Supercomputing.

[18]  M. Schatz,et al.  Searching for SNPs with cloud computing , 2009, Genome Biology.

[19]  Quan Chen,et al.  SAMR: A Self-adaptive MapReduce Scheduling Algorithm in Heterogeneous Environment , 2010, 2010 10th IEEE International Conference on Computer and Information Technology.