Efficient simulation of agent-based models on multi-GPU and multi-core clusters

An effective latency-hiding mechanism is presented in the parallelization of agent-based model simulations (ABMS) with millions of agents. The mechanism is designed to accommodate the hierarchical organization as well as heterogeneity of current state-of-the-art parallel computing platforms. We use it to explore the computation vs. communication trade-off continuum available with the deep computational and memory hierarchies of extant platforms and present a novel analytical model of the tradeoff. We describe our implementation and report preliminary performance results on two distinct parallel platforms suitable for ABMS: CUDA threads on multiple, networked graphical processing units (GPUs), and pthreads on multi-core processors. Message Passing Interface (MPI) is used for inter-GPU as well as inter-socket communication on a cluster of multiple GPUs and multi-core processors. Results indicate the benefits of our latency-hiding scheme, delivering as much as over 100-fold improvement in runtime for certain benchmark ABMS application scenarios with several million agents. This speed improvement is obtained on our system that is already two to three orders of magnitude faster on one GPU than an equivalent CPU-based execution in a popular simulator in Java. Thus, the overall execution of our current work is over four orders of magnitude faster when executed on multiple GPUs.

[1]  Charles M. Macal,et al.  Managing Business Complexity: Discovering Strategic Solutions with Agent-Based Modeling and Simulation , 2007 .

[2]  Yun He,et al.  A Ghost Cell Expansion Method for Reducing Communications in Solving PDE Problems , 2001, ACM/IEEE SC 2001 Conference (SC'01).

[3]  James H. Oliver,et al.  UAV Swarm Control: Calculating Digital Pheromone Fields with the GPU , 2006 .

[4]  Samuel Williams,et al.  Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[5]  Patrick Riley SPADES: A System for Parallel-Agent, Discrete-Event Simulation , 2003, AI Mag..

[6]  Uday Bondhugula,et al.  Effective automatic parallelization of stencil computations , 2007, PLDI '07.

[7]  Zach Heath,et al.  Parallel computing in enterprise modeling. , 2008 .

[8]  Michael Lees,et al.  Distributed simulation of agent-based systems with HLA , 2007, TOMC.

[9]  Kevin Skadron,et al.  Performance modeling and automatic ghost zone optimization for iterative stencil loops on GPUs , 2009, ICS.

[10]  R. D'Souza SUGARSCAPE ON STEROIDS : SIMULATING OVER A MILLION AGENTS , 2007 .

[11]  Eileen Kraemer,et al.  SASSY: A Design for a Scalable Agent-Based Simulation System using a Distributed Discrete Event Infrastructure , 2006, Proceedings of the 2006 Winter Simulation Conference.

[12]  Jon Parker A flexible, large-scale, distributed agent based epidemic model , 2007, 2007 Winter Simulation Conference.

[13]  Christopher N. Eichelberger,et al.  Actionable Capability for Social and Economic Systems (ACSES) , 2008 .

[14]  Kalyan S. Perumalla,et al.  Data parallel execution challenges and runtime performance of agent simulations on GPUs , 2008, SpringSim '08.

[15]  Sean Luke,et al.  MASON: A New Multi-Agent Simulation Toolkit , 2004 .

[16]  Joshua M Epstein,et al.  Modeling civil violence: An agent-based computational approach , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[17]  Michael J. North,et al.  Experiences creating three implementations of the repast agent modeling toolkit , 2006, TOMC.

[18]  Peter Tröger,et al.  Performance Optimization for Multi-agent Based Simulation in Grid Environments , 2008, 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid (CCGRID).

[19]  Adelinde M. Uhrmacher,et al.  Distributed, parallel simulation of multiple, deliberative agents , 2000, Proceedings Fourteenth Workshop on Parallel and Distributed Simulation.

[20]  M. Hoemmen,et al.  Communication savings with ghost cell expansion for domain decompositions of finite difference grids , 2004 .

[21]  Ian T. Foster,et al.  Cactus Application: Performance Predictions in Grid Environments , 2001, Euro-Par.

[22]  Weiqiang Wang,et al.  A Multilevel Parallelization Framework for High-Order Stencil Computations , 2009, Euro-Par.