Executing a biological sequence comparison application on a federated cloud environment

Smith-Waterman (SW) is a popular application in Bioinformatics which calculates the best score/alignment between two genomic sequences. Even though SW provides the best result, it is not widely used in genome projects due to huge requirements in computing power and memory space. Recently, Cloud Computing has been receiving a lot of attention since it is able to provide utility computing in an elastic environment. The advantages of Cloud Computing can be obtained at zero cost since many of the Public Clouds provide free usage slots, allowing users to run their applications for free in Cloud environments. Also, many Clouds can be put together and seen as a unique environment, creating Federated Clouds. In this paper, we propose and evaluate an approach to implement the SW algorithm in Federated Clouds. A hierarchical Multi-Cloud architecture is proposed which is able to transparently connect and manage several Clouds. The results obtained with our architecture and our MapReduce SW implementation in five Public Clouds show that, only by using the free quota, we were able to run the SW application over a huge genomic database in time that is comparable with the one obtained in multicore clusters, showing the appropriateness of our approach.

[1]  J. R. King,et al.  The Challenge of the Computer Utility , 1967 .

[2]  Randy H. Katz,et al.  Above the Clouds: A Berkeley View of Cloud Computing , 2009 .

[3]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[4]  Mahdi Noorian,et al.  Performance enhancement of smith-waterman algorithm using hybrid model: Comparing the MPI and hybrid programming paradigm on SMP clusters , 2009, 2009 IEEE International Conference on Systems, Man and Cybernetics.

[5]  Antonio Puliafito,et al.  How to Enhance Cloud Architectures to Enable Cross-Federation , 2010, IEEE CLOUD.

[6]  Srinivas Aluru,et al.  Space and time optimal parallel sequence alignments , 2004, IEEE Transactions on Parallel and Distributed Systems.

[7]  T. Speed,et al.  Biological Sequence Analysis , 1998 .

[8]  Edans Flavius de Oliveira Sandes,et al.  Smith-Waterman Alignment of Huge Sequences with GPU in Linear Space , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.

[9]  Yongchao Liu,et al.  CUDASW++: optimizing Smith-Waterman sequence database searches for CUDA-enabled graphics processing units , 2009, BMC Research Notes.

[10]  P. Mell,et al.  The NIST Definition of Cloud Computing , 2011 .

[11]  Stéphane Le Crom,et al.  Eoulsan: a cloud computing-based framework facilitating high throughput sequencing analyses , 2012, Bioinform..

[12]  Rajkumar Buyya,et al.  InterCloud: Utility-Oriented Federation of Cloud Computing Environments for Scaling of Application Services , 2010, ICA3PP.

[13]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[14]  Roy Fielding,et al.  Architectural Styles and the Design of Network-based Software Architectures"; Doctoral dissertation , 2000 .

[15]  Chris Rose,et al.  A Break in the Clouds: Towards a Cloud Definition , 2011 .

[16]  Wu-chun Feng,et al.  Optimizing performance, cost, and sensitivity in pairwise sequence search on a cluster of PlayStations , 2008, 2008 8th IEEE International Conference on BioInformatics and BioEngineering.

[17]  Yong Zhao,et al.  Cloud Computing and Grid Computing 360-Degree Compared , 2008, GCE 2008.

[18]  Christophe Dessimoz,et al.  SWPS3 – fast multi-threaded vectorized Smith-Waterman for IBM Cell/B.E. and ×86/SSE2 , 2008, BMC Research Notes.

[19]  Geoffrey C. Fox,et al.  Biomedical Case Studies in Data Intensive Computing , 2009, CloudCom.

[20]  D. Mount Bioinformatics: Sequence and Genome Analysis , 2001 .