Parallel stochastic systems biology in the cloud

The stochastic modelling of biological systems, coupled with Monte Carlo simulation of models, is an increasingly popular technique in bioinformatics. The simulation-analysis workflow may result computationally expensive reducing the interactivity required in the model tuning. In this work, we advocate the high-level software design as a vehicle for building efficient and portable parallel simulators for the cloud. In particular, the Calculus of Wrapped Components (CWC) simulator for systems biology, which is designed according to the FastFlow pattern-based approach, is presented and discussed. Thanks to the FastFlow framework, the CWC simulator is designed as a high-level workflow that can simulate CWC models, merge simulation results and statistically analyse them in a single parallel workflow in the cloud. To improve interactivity, successive phases are pipelined in such a way that the workflow begins to output a stream of analysis results immediately after simulation is started. Performance and effectiveness of the CWC simulator are validated on the Amazon Elastic Compute Cloud.

[1]  Maurizio Drocco,et al.  Simulation techniques for the calculus of wrapped compartments , 2012, Theor. Comput. Sci..

[2]  L. Stein The case for cloud computing in genome informatics , 2010, Genome Biology.

[3]  Calin Belta,et al.  Hybrid Modeling and Simulation of Biomolecular Networks , 2001, HSCC.

[4]  Claudia Leopold,et al.  Parallel and distributed computing , 2000 .

[5]  Gianfranco Balbo,et al.  Performance Models for Discrete Event Systems with Synchronizations: Formalisms and Analysis Techniques , 1998 .

[6]  Lubos Brim,et al.  High-performance analysis of biological systems dynamics with the DiVinE model checker , 2010, Briefings Bioinform..

[7]  Hong Li,et al.  Efficient Parallelization of the Stochastic Simulation Algorithm for Chemically Reacting Systems On the Graphics Processing Unit , 2010, Int. J. High Perform. Comput. Appl..

[8]  Insung Park,et al.  Parallel programming environment for OpenMP , 2001, Sci. Program..

[9]  Glen E. P. Ropella,et al.  Cloud computing and validation of expandable in silico livers , 2010, BMC Systems Biology.

[10]  Keijo Ruohonen,et al.  Computational study of noise in a large signal transduction network , 2011, BMC Bioinformatics.

[11]  Pietro Liò,et al.  Trends in modeling Biomedical Complex Systems , 2009, BMC Bioinformatics.

[12]  Peter Kilpatrick,et al.  An Efficient Unbounded Lock-Free Queue for Multi-core Systems , 2012, Euro-Par.

[13]  Pietro Liò,et al.  StochKit-FF: Efficient Systems Biology on Multicore Architectures , 2010, Euro-Par Workshops.

[14]  Tapabrata Ray,et al.  ENGINEERING DESIGN OPTIMIZATION USING A SWARM WITH AN INTELLIGENT INFORMATION SHARING AMONG INDIVIDUALS , 2001 .

[15]  Michael C. Schatz,et al.  Cloud Computing and the DNA Data Race , 2010, Nature Biotechnology.

[16]  William Thies,et al.  StreamIt: A Language for Streaming Applications , 2002, CC.

[17]  Ke Chen,et al.  Survey of MapReduce frame operation in bioinformatics , 2013, Briefings Bioinform..

[18]  Massimo Torquati,et al.  On Parallelizing On-Line Statistics for Stochastic Biological Simulations , 2011, Euro-Par Workshops.

[19]  Horacio González-Vélez,et al.  Parallel Stochastic Simulation of Macroscopic calcium currents , 2007, J. Bioinform. Comput. Biol..

[20]  Hanlee P. Ji,et al.  Next-generation DNA sequencing , 2008, Nature Biotechnology.

[21]  Charles Blilie,et al.  Patterns in scientific software: an introduction , 2002, Comput. Sci. Eng..

[22]  Andreas Hellander,et al.  URDME: a modular framework for stochastic simulation of reaction-transport processes in complex geometries , 2012, BMC Systems Biology.

[23]  Marco Vanneschi,et al.  The programming model of ASSIST, an environment for parallel and distributed portable applications , 2002, Parallel Comput..

[24]  Wolfgang Weiss,et al.  A Computational Systems Biology Software Platform for Multiscale Modeling and Simulation: Integrating Whole-Body Physiology, Disease Biology, and Molecular Reaction Networks , 2011, Front. Physio..

[25]  C Gomà,et al.  CloudMC: a cloud computing application for Monte Carlo simulation , 2013, Physics in medicine and biology.

[26]  Michael C. Schatz,et al.  CloudBurst: highly sensitive read mapping with MapReduce , 2009, Bioinform..

[27]  M. Sevior,et al.  Belle Monte-Carlo Production on the Amazon EC2 Cloud , 2010 .

[28]  Pat Hanrahan,et al.  Brook for GPUs: stream computing on graphics hardware , 2004, ACM Trans. Graph..

[29]  Peter Kilpatrick,et al.  Targeting Distributed Systems in FastFlow , 2012, Euro-Par Workshops.

[30]  Luca Cardelli,et al.  Efficient, Correct Simulation of Biological Processes in the Stochastic Pi-calculus , 2007, CMSB.

[31]  Darren J. Wilkinson,et al.  CaliBayes and BASIS: integrated tools for the calibration, simulation and storage of biological simulation models , 2010, Briefings Bioinform..

[32]  Jesús A. Izaguirre,et al.  From Genes to Organisms Via the Cell: A Problem-Solving Environment for Multicellular Development , 2007, Computing in Science & Engineering.

[33]  Massimo Torquati,et al.  On Designing Multicore-Aware Simulators for Biological Systems , 2011, 2011 19th International Euromicro Conference on Parallel, Distributed and Network-Based Processing.

[34]  Horacio González-Vélez,et al.  A survey of algorithmic skeleton frameworks: high‐level structured parallel programming enablers , 2010, Softw. Pract. Exp..

[35]  Gerhard Weikum,et al.  Database and information-retrieval methods for knowledge discovery , 2009, CACM.

[36]  Sanjay Ghemawat,et al.  MapReduce: simplified data processing on large clusters , 2008, CACM.

[37]  Radek Erban,et al.  STOCHSIMGPU: parallel stochastic simulation for the Systems Biology Toolbox 2 for MATLAB , 2011, Bioinform..

[38]  Vassilios Sotiropoulos,et al.  Multiscale Hy3S: Hybrid stochastic simulation for supercomputers , 2006, BMC Bioinformatics.

[39]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[40]  Hao Zhu,et al.  Grid Cellware: the first grid-enabled tool for modelling and simulating cellular processes , 2005, Bioinform..

[41]  Peter S. Pacheco Parallel programming with MPI , 1996 .

[42]  B. Langmead,et al.  Cloud-scale RNA-sequencing differential expression analysis with Myrna , 2010, Genome Biology.

[43]  Jesús A. Izaguirre,et al.  From Genes to Organisms Via the Cell: A Problem-Solving Environment for Multicellular Development , 2007, Computing in Science & Engineering.

[44]  Gilles Kahn,et al.  The Semantics of a Simple Language for Parallel Programming , 1974, IFIP Congress.

[45]  D. B. Davis,et al.  Intel Corp. , 1993 .

[46]  M. Schatz,et al.  Searching for SNPs with cloud computing , 2009, Genome Biology.

[47]  A. Goldbeter,et al.  Limit Cycle Models for Circadian Rhythms Based on Transcriptional Regulation in Drosophila and Neurospora , 1999, Journal of biological rhythms.

[48]  Andrew Rambaut,et al.  Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees , 1997, Comput. Appl. Biosci..

[49]  Hermes Senger,et al.  Improving scalability of Bag-of-Tasks applications running on master-slave platforms , 2009, Parallel Comput..

[50]  Jane Hillston,et al.  Bio-PEPA: An Extension of the Process Algebra PEPA for Biochemical Networks , 2007, FBTC@CONCUR.

[51]  José A. B. Fortes,et al.  CloudBLAST: Combining MapReduce and Virtualization on Distributed Resources for Bioinformatics Applications , 2008, 2008 IEEE Fourth International Conference on eScience.

[52]  Hamid Bolouri,et al.  Dizzy: Stochastic Simulation of Large-scale Genetic Regulatory Networks (supplementary Material) , 2005, J. Bioinform. Comput. Biol..

[53]  Horacio González-Vélez,et al.  Streaming Dynamic Coarse-Grained CPU/GPU Workloads with Heterogeneous Pipelines in FastFlow , 2012, 2012 IEEE 14th International Conference on High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and Systems.

[54]  D. Gillespie Exact Stochastic Simulation of Coupled Chemical Reactions , 1977 .

[55]  Maurizio Drocco,et al.  Stochastic Calculus of Wrapped Compartments , 2010, QAPL.

[56]  Heng Li,et al.  A survey of sequence alignment algorithms for next-generation sequencing , 2010, Briefings Bioinform..

[57]  Thomas Hinze,et al.  Rule-based spatial modeling with diffusing, geometrically constrained molecules , 2010, BMC Bioinformatics.

[58]  James Demmel,et al.  the Parallel Computing Landscape , 2022 .

[59]  Murray Cole,et al.  Bringing skeletons out of the closet: a pragmatic manifesto for skeletal parallel programming , 2004, Parallel Comput..