Using the Cloud for Parameter Estimation Problems: Comparing Spark vs MPI with a Case-Study

Systems biology is an emerging approach focused in generating new knowledge about complex biological systems by combining experimental data with mathematical modeling and advanced computational techniques. Many problems in this field are extremely challenging and require substantial supercomputing resources to be solved. This is the case of parameter estimation in large-scale nonlinear dynamic systems biology models. Recently, Cloud Computing has emerged as a new paradigm for on-demand delivery of computing resources. However, scientific computing community has been quite hesitant in using the Cloud, simply because traditional programming models do not fit well with the new paradigm, and the earliest cloud programming models do not allow most scientific computations being efficiently run in the Cloud. In this paper we explore and compare two distributed computing models: the MPI (message-passing interface) model, that is high-performance oriented, and the Spark model, which is throughput oriented but outperforms other cloud programming solutions adding improved support for iterative algorithms through in-memory computing. The performance of a very well known metaheuristic, the Differential Evolution algorithm, has been thoroughly assessed using a challenging parameter estimation problem from the domain of computational systems biology. The experiments have been carried out both in a local cluster and in the Microsoft Azure public cloud, allowing performance and cost evaluation for both infrastructures.

[1]  Felix Naumann,et al.  The Stratosphere platform for big data analytics , 2014, The VLDB Journal.

[2]  P. N. Suganthan,et al.  Differential Evolution: A Survey of the State-of-the-Art , 2011, IEEE Transactions on Evolutionary Computation.

[3]  Michael J. Franklin,et al.  Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.

[4]  John Shalf,et al.  Defining future platform requirements for e-Science clouds , 2010, SoCC '10.

[5]  Rajkumar Buyya,et al.  MRPGA: An Extension of MapReduce for Parallelizing Genetic Algorithms , 2008, 2008 IEEE Fourth International Conference on eScience.

[6]  Yu-Ting Hsiao,et al.  Designing a parallel evolutionary algorithm for inferring gene networks on the cloud computing environment , 2014, BMC Systems Biology.

[7]  Jie Li,et al.  eScience in the cloud: A MODIS satellite data reprojection and reduction pipeline in the Windows Azure platform , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[8]  Julio R. Banga,et al.  Optimization in computational systems biology , 2008, BMC Systems Biology.

[9]  Geoffrey C. Fox,et al.  Twister: a runtime for iterative MapReduce , 2010, HPDC '10.

[10]  Julio R. Banga,et al.  Implementing Parallel Differential Evolution on Spark , 2016, EvoApplications.

[11]  Jerome Lauret,et al.  Virtual workspaces for scientific applications. , 2007 .

[12]  Sushil K. Prasad,et al.  Towards an MPI-Like Framework for the Azure Cloud Platform , 2014, 2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.

[13]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[14]  Julio R. Banga,et al.  Reverse engineering and identification in systems biology: strategies, perspectives and challenges , 2014, Journal of The Royal Society Interface.

[15]  Atanas Radenski,et al.  Distributed Simulated Annealing with MapReduce , 2012, EvoApplications.

[16]  Scott Hazelhurst,et al.  Scientific computing using virtual high-performance computing: a case study using the Amazon elastic computing cloud , 2008, SAICSIT '08.

[17]  Julio R. Banga,et al.  Enhanced parallel Differential Evolution algorithm for problems in computational systems biology , 2015, Appl. Soft Comput..

[18]  Chi Zhou Fast parallelization of differential evolution algorithm using MapReduce , 2010, GECCO '10.

[19]  Wenguang Chen,et al.  Cloud versus in-house cluster: Evaluating Amazon cluster compute instances for running MPI applications , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[20]  Alexandru Iosup,et al.  An Early Performance Analysis of Cloud Computing Services for Scientific Computing , 2008 .

[21]  Ville Tirronen,et al.  On memetic Differential Evolution frameworks: A study of advantages and limitations in hybridization , 2008, 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence).

[22]  Kevin D. Seppi,et al.  Parallel PSO using MapReduce , 2007, 2007 IEEE Congress on Evolutionary Computation.

[23]  Zhiwei Xu,et al.  DataMPI: Extending MPI to Hadoop-Like Big Data Computing , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.

[24]  Sabela Ramos,et al.  Performance analysis of HPC applications in the cloud , 2013, Future Gener. Comput. Syst..

[25]  James Demmel,et al.  Matrix Factorization at Scale: a Comparison of Scientific Data Analytics in Spark and C+MPI Using Three Case Studies , 2016, ArXiv.

[26]  Rainer Storn,et al.  Differential Evolution – A Simple and Efficient Heuristic for global Optimization over Continuous Spaces , 1997, J. Glob. Optim..

[27]  Carmen G. Moles,et al.  Parameter estimation in biochemical pathways: a comparison of global optimization methods. , 2003, Genome research.

[28]  John Shalf,et al.  Performance Analysis of High Performance Computing Applications on the Amazon Web Services Cloud , 2010, 2010 IEEE Second International Conference on Cloud Computing Technology and Science.

[29]  M S Turner,et al.  Modelling genetic networks with noisy and varied experimental data: the circadian clock in Arabidopsis thaliana. , 2005, Journal of theoretical biology.

[30]  Constantinos Evangelinos,et al.  Cloud Computing for parallel Scientific HPC Applications: Feasibility of Running Coupled Atmosphere- , 2008 .

[31]  Paolo Bientinesi,et al.  Can cloud computing reach the top500? , 2009, UCHPC-MAW '09.

[32]  Julio R. Banga,et al.  Evaluation of Parallel Differential Evolution Implementations on MapReduce and Spark , 2016, Euro-Par Workshops.

[33]  Xavier Llorà,et al.  Scaling Genetic Algorithms Using MapReduce , 2009, 2009 Ninth International Conference on Intelligent Systems Design and Applications.

[34]  Ponnuthurai N. Suganthan,et al.  Recent advances in differential evolution - An updated survey , 2016, Swarm Evol. Comput..

[35]  Enrique Alba,et al.  Parallel metaheuristics: recent advances and new trends , 2012, Int. Trans. Oper. Res..