From desktop to Large-Scale Model Exploration with Swift/T

As high-performance computing resources have become increasingly available, new modes of computational processing and experimentation have become possible. This tutorial presents the Extreme-scale Model Exploration with Swift/T (EMEWS) framework for combining existing capabilities for model exploration approaches (e.g., model calibration, metaheuristics, data assimilation) and simulations (or any “black box” application code) with the Swift/T parallel scripting language to run scientific workflows on a variety of computing resources, from desktop to academic clusters to Top 500 level supercomputers. We will present a number of use-cases, starting with a simple agent-based model parameter sweep, and ending with a complex adaptive parameter space exploration workflow coordinating ensembles of distributed simulations. The use-cases are published on a public repository for interested parties to download and run on their own.

[1]  Yuan Yu,et al.  Dryad: distributed data-parallel programs from sequential building blocks , 2007, EuroSys '07.

[2]  Andreas Huth,et al.  Statistical inference for stochastic simulation models--theory and application. , 2011, Ecology letters.

[3]  Charles M. Macal,et al.  Adaptive Simulation with Repast Simphony and Swift , 2014, Euro-Par Workshops.

[4]  Weijia Xu,et al.  Composing and executing parallel data-flow graphs with shell pipes , 2009, WORKS '09.

[5]  Douglas Thain,et al.  Distributed computing in practice: the Condor experience , 2005, Concurr. Pract. Exp..

[6]  Max Kuhn,et al.  Building Predictive Models in R Using the caret Package , 2008 .

[7]  A. Nekrutenko,et al.  Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences , 2010, Genome Biology.

[8]  Alex Rodriguez,et al.  Enabling multi-task computation on Galaxy-based gateways using swift , 2013, 2013 IEEE International Conference on Cluster Computing (CLUSTER).

[9]  David Cohn,et al.  Active Learning , 2010, Encyclopedia of Machine Learning.

[10]  Ian T. Foster,et al.  Compiler Techniques for Massively Scalable Implicit Task Parallelism , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.

[11]  Ian T. Foster,et al.  Compositional parallel programming languages , 1996, TOPL.

[12]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[13]  Enrique Alba,et al.  Parallel metaheuristics: recent advances and new trends , 2012, Int. Trans. Oper. Res..

[14]  Jesús Carretero,et al.  Flexible Data-Aware Scheduling for Workflows over an In-memory Object Store , 2016, 2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid).

[15]  Stuart I. Feldman,et al.  Make — a program for maintaining computer programs , 1979, Softw. Pract. Exp..

[16]  Daniel S. Katz,et al.  Pegasus: A framework for mapping complex scientific workflows onto distributed systems , 2005, Sci. Program..

[17]  Michael J. North,et al.  Complex adaptive systems modeling with Repast Simphony , 2013, Complex Adapt. Syst. Model..

[18]  David M. Beazley,et al.  SWIG: An Easy to Use Tool for Integrating Scripting Languages with C and C++ , 1996, Tcl/Tk Workshop.

[19]  Michael J. North,et al.  Parallel agent-based simulation with Repast for High Performance Computing , 2013, Simul..

[20]  F. Al-Shamali,et al.  Author Biographies. , 2015, Journal of social work in disability & rehabilitation.

[21]  Marc Parizeau,et al.  DEAP: evolutionary algorithms made easy , 2012, J. Mach. Learn. Res..

[22]  Katalin Csill'ery,et al.  abc: an R package for approximate Bayesian computation (ABC) , 2011, 1106.2793.

[23]  R. Plevin,et al.  Approximate Bayesian Computation in Evolution and Ecology , 2011 .

[24]  Matthew R. Pocock,et al.  Taverna: a tool for the composition and enactment of bioinformatics workflows , 2004, Bioinform..

[25]  Sebastián Lozano,et al.  Metaheuristic optimization frameworks: a survey and benchmarking , 2011, Soft Computing.

[26]  Franck Jabot,et al.  EasyABC: performing efficient approximate Bayesian computation sampling schemes using R , 2013 .

[27]  Enrique Alba,et al.  MALLBA: a software library to design efficient optimisation algorithms , 2007 .

[28]  Alex Rodriguez,et al.  Extending the Galaxy portal with parallel and distributed execution capability , 2013 .

[29]  Forrest Stonedahl,et al.  Genetic algorithms for the exploration of parameter spaces in agent-based models , 2011 .

[30]  Patrick Siarry,et al.  A survey on optimization metaheuristics , 2013, Inf. Sci..

[31]  Gennaro Cordasco,et al.  SOF: Zero Configuration Simulation Optimization Framework on the Cloud , 2016, 2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP).

[32]  Sean Luke,et al.  MASON: A Multiagent Simulation Environment , 2005, Simul..

[33]  Daniel S. Katz,et al.  Toward Interlanguage Parallel Scripting for Distributed-Memory Scientific Computing , 2015, 2015 IEEE International Conference on Cluster Computing.

[34]  Ian T. Foster,et al.  Dataflow coordination of data-parallel tasks via MPI 3.0 , 2013, EuroMPI.

[35]  Gábor Terstyánszky,et al.  A tutorial on Cloud computing for Agent-based Modeling & Simulation with Repast , 2014, Proceedings of the Winter Simulation Conference 2014.

[36]  Bertram Ludäscher,et al.  Scientific workflow design for mere mortals , 2009, Future Gener. Comput. Syst..

[37]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[38]  Justin M. Wozniak,et al.  Many Resident Task Computing in Support of Dynamic Ensemble Computations , 2015 .