Self-service infrastructure container for data intensive application

Cloud based scientific data management - storage, transfer, analysis, and inference extraction - is attracting interest. In this paper, we propose a next generation cloud deployment model suitable for data intensive applications. Our model is a flexible and self-service container-based infrastructure that delivers - network, computing, and storage resources together with the logic to dynamically manage the components in a holistic manner. We demonstrate the strength of our model with a bioinformatics application. Dynamic algorithms for resource provisioning and job allocation suitable for the chosen dataset are packaged and delivered in a privileged virtual machine as part of the container. We tested the model on our private internal experimental cloud that is built on low-cost commodity hardware. We demonstrate the capability of our model to create the required network and computing resources and allocate submitted jobs. The results obtained shows the benefits of increased automation in terms of both a significant improvement in the time to complete a data analysis and a reduction in the cost of analysis. The algorithms proposed reduced the cost of performing analysis by 50% at 15 GB of data analysis. The total time between submitting a job and writing the results after analysis also reduced by more than 1 hr at 15 GB of data analysis.

[1]  Julien Gossa,et al.  Cost-Wait Trade-Offs in Client-Side Resource Provisioning with Elastic Clouds , 2011, 2011 IEEE 4th International Conference on Cloud Computing.

[2]  Randy H. Katz,et al.  Above the Clouds: A Berkeley View of Cloud Computing , 2009 .

[3]  Olivia Sanchez-Graillet,et al.  Identifying the impact of G-Quadruplexes on Affymetrix 3' Arrays using Cloud Computing , 2010, J. Integr. Bioinform..

[4]  Noel M. Morris Programming and Applications , 1981 .

[5]  Joel T Dudley,et al.  In silico research in the era of cloud computing , 2010, Nature Biotechnology.

[6]  Dawn Field,et al.  Open software for biologists: from famine to feast , 2006, Nature Biotechnology.

[7]  Krzysztof Zieli,et al.  User-Oriented provisioning of secure virtualized infrastructure , 2012 .

[8]  Javier Bajo,et al.  Cloud Computing in Bioinformatics , 2010, DCAI.

[9]  Michael C. Schatz,et al.  Cloud Computing and the DNA Data Race , 2010, Nature Biotechnology.

[10]  Nesime Tatbul,et al.  Incremental DNA Sequence Analysis in the Cloud , 2012, SSDBM.

[11]  Krzysztof Zielinski,et al.  User-Oriented Provisioning of Secure Virtualized Infrastructure , 2012, PL-Grid.

[12]  Stuart D. Walker,et al.  Multi Objective Optimization Strategy Suitable for Virtual Cells as a Service , 2013, IBICA.

[13]  David A. Maltz,et al.  Network traffic characteristics of data centers in the wild , 2010, IMC '10.

[14]  Andrew P. Harrison,et al.  Normalized Affymetrix expression data are biased by G-quadruplex formation , 2011, Nucleic acids research.

[15]  Konstantinos Krampis,et al.  Cloud BioLinux: pre-configured and on-demand bioinformatics computing for the genomics community , 2012, BMC Bioinformatics.

[16]  Rajkumar Buyya,et al.  High Performance Cluster Computing , 1999 .

[17]  Eric Bouillet,et al.  Efficient resource provisioning in compute clouds via VM multiplexing , 2010, ICAC '10.

[18]  Randy H. Katz,et al.  A view of cloud computing , 2010, CACM.

[19]  Madeleine Glick,et al.  Your Data Center Is a Router: The Case for Reconfigurable Optical Circuit Switched Paths , 2009, HotNets.

[20]  Amin Vahdat,et al.  Hedera: Dynamic Flow Scheduling for Data Center Networks , 2010, NSDI.

[21]  C. Ball,et al.  Submission of Microarray Data to Public Repositories , 2004, PLoS biology.

[22]  Marian Bubak,et al.  Building a National Distributed e-Infrastructure–PL-Grid , 2012, Lecture Notes in Computer Science.

[23]  José A. B. Fortes,et al.  CloudBLAST: Combining MapReduce and Virtualization on Distributed Resources for Bioinformatics Applications , 2008, 2008 IEEE Fourth International Conference on eScience.

[24]  A. Nekrutenko,et al.  Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences , 2010, Genome Biology.

[25]  Chandrakant D. Patel,et al.  Everything as a Service: Powering the New Information Economy , 2011, Computer.

[26]  L. Stein The case for cloud computing in genome informatics , 2010, Genome Biology.

[27]  Umesh Deshpande,et al.  Post-copy live migration of virtual machines , 2009, OPSR.

[28]  Kenny Q. Ye,et al.  An integrated map of genetic variation from 1,092 human genomes , 2012, Nature.

[29]  Guilherme Piegas Koslovski,et al.  VXDL: Virtual Resources and Interconnection Networks Description Language , 2008, GridNets.

[30]  Constantin Adam,et al.  Adaptable server clusters with QoS objectives , 2005, 2005 9th IFIP/IEEE International Symposium on Integrated Network Management, 2005. IM 2005..

[31]  Raouf Boutaba,et al.  Cloud computing: state-of-the-art and research challenges , 2010, Journal of Internet Services and Applications.

[32]  Michael C. Schatz,et al.  CloudBurst: highly sensitive read mapping with MapReduce , 2009, Bioinform..

[33]  Steven J. Sollott,et al.  Examining Intracellular Organelle Function Using Fluorescent Probes: From Animalcules to Quantum Dots , 2004, Circulation research.

[34]  Geoffrey I. Webb,et al.  Not So Naive Bayes: Aggregating One-Dependence Estimators , 2005, Machine Learning.

[35]  Yehia Taher,et al.  Challenges for the comprehensive management of Cloud Services in a PaaS framework , 2012, Scalable Comput. Pract. Exp..

[36]  B. Langmead,et al.  Cloud-scale RNA-sequencing differential expression analysis with Myrna , 2010, Genome Biology.

[37]  Eero Vainikko,et al.  SciCloud: Scientific Computing on the Cloud , 2010, 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing.

[38]  Dennis B. Troup,et al.  NCBI GEO: mining tens of millions of expression profiles—database and tools update , 2006, Nucleic Acids Res..

[39]  Prashant J. Shenoy,et al.  The Case for Enterprise-Ready Virtual Private Clouds , 2009, HotCloud.

[40]  David E. Williams Virtualization with Xen(tm): Including XenEnterprise, XenServer, and XenExpress: Including XenEnterprise, XenServer, and XenExpress , 2007 .

[41]  Benoit Hudzia,et al.  Future Generation Computer Systems Optimis: a Holistic Approach to Cloud Service Provisioning , 2022 .

[42]  C.H. Sequin,et al.  Optimal adaptive k-means algorithm with dynamic adjustment of learning rate , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[43]  Rob Sherwood,et al.  FlowVisor: A Network Virtualization Layer , 2009 .

[44]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[45]  Marianne Winslett,et al.  Scientific and Statistical Database Management, 21st International Conference, SSDBM 2009, New Orleans, LA, USA, June 2-4, 2009, Proceedings , 2009, SSDBM.

[46]  V. P. Anuradha,et al.  A survey on resource allocation strategies in cloud computing , 2014, International Conference on Information Communication and Embedded Systems (ICICES2014).