Behavioral simulations in MapReduce

In many scientific domains, researchers are turning to large-scale behavioral simulations to better understand real-world phenomena. While there has been a great deal of work on simulation tools from the high-performance computing community, behavioral simulations remain challenging to program and automatically scale in parallel environments. In this paper we present BRACE (Big Red Agent-based Computation Engine), which extends the MapReduce framework to process these simulations efficiently across a cluster. We can leverage spatial locality to treat behavioral simulations as iterated spatial joins and greatly reduce the communication between nodes. In our experiments we achieve nearly linear scale-up on several realistic simulations. Though processing behavioral simulations in parallel as iterated spatial joins can be very efficient, it can be much simpler for the domain scientists to program the behavior of a single agent. Furthermore, many simulations include a considerable amount of complex computation and message passing between agents, which makes it important to optimize the performance of a single node and the communication across nodes. To address both of these challenges, BRACE includes a high-level language called BRASIL (the Big Red Agent SImulation Language). BRASIL has object-oriented features for programming simulations, but can be compiled to a dataflow representation for automatic parallelization and optimization. We show that by using various optimization techniques, we can achieve both scalability and single-node performance similar to that of a hand-coded simulation.

[1]  K. Mani Chandy,et al.  Distributed Simulation: A Case Study in Design and Verification of Distributed Programs , 1979, IEEE Transactions on Software Engineering.

[2]  Edmond Schonberg,et al.  Programming with Sets: An Introduction to SETL , 1986 .

[3]  Dirk Van Gucht,et al.  Possibilities and limitations of using flat operators in nested algebra expressions , 1988, PODS '88.

[4]  Donovan A. Schneider,et al.  The Gamma Database Machine Project , 1990, IEEE Trans. Knowl. Data Eng..

[5]  Jon Louis Bentley,et al.  K-d trees for semidynamic point sets , 1990, SCG '90.

[6]  Goetz Graefe,et al.  Encapsulation of parallelism in the Volcano query processing system , 1990, SIGMOD '90.

[7]  Limsoon Wong,et al.  Naturally Embedded Query Languages , 1992, ICDT.

[8]  David M. Nicol,et al.  The cost of conservative synchronization in parallel discrete event simulations , 1993, JACM.

[9]  Friedemann Mattern,et al.  Efficient Algorithms for Distributed Snapshots and Global Virtual Time Approximation , 1993, J. Parallel Distributed Comput..

[10]  Richard M. Fujimoto,et al.  GTW: a time warp system for shared memory multiprocessors , 1994, Proceedings of Winter Simulation Conference.

[11]  Mamoru Hoshi,et al.  Gaming-Simulations of Multi-Agent Information Systems using Large Databases: The Concept and Database Algorithms , 1995, DASFAA.

[12]  Limsoon Wong,et al.  Principles of Programming with Complex Objects and Collection Types , 1995, Theor. Comput. Sci..

[13]  Nelson Minar,et al.  The Swarm Simulation System: A Toolkit for Building Multi-Agent Simulations , 1996 .

[14]  Guy E. Blelloch,et al.  Programming parallel algorithms , 1996, CACM.

[15]  Dan Suciu Parallel programming languages for collections , 1996 .

[16]  Joshua M. Epstein,et al.  Growing artificial societies , 1996 .

[17]  Jin-Soo Kim,et al.  Relaxed Barrier Synchronization for the BSP Model of Computation on Message-Passing Architectures , 1998, Inf. Process. Lett..

[18]  Haris N. Koutsopoulos,et al.  Simulation Laboratory for Evaluating Dynamic Traffic Management Systems , 1997 .

[19]  Bruce Hendrickson,et al.  Dynamic load balancing in computational mechanics , 2000 .

[20]  Kai Nagel,et al.  Parallel implementation of the TRANSIMS micro-simulation , 2001, Parallel Comput..

[21]  D. Schrank,et al.  2001 Urban Mobility Report , 2001 .

[22]  D. Schrank,et al.  THE 2004 URBAN MOBILITY REPORT , 2002 .

[23]  L. Alvisi,et al.  A Survey of Rollback-Recovery Protocols , 2002 .

[24]  Jeffrey F. Naughton,et al.  A non-blocking parallel spatial join algorithm , 2002, Proceedings 18th International Conference on Data Engineering.

[25]  Joshua M. Epstein,et al.  Growing Artificial Societies: Social Science from the Bottom Up , 1996 .

[26]  N. Cetin,et al.  A Large-Scale Agent-Based Traffic Microsimulation Based On Queue Model , 2003 .

[27]  Victor R. Lesser,et al.  Farm: A Scalable Environment for Multi-agent Development and Evaluation , 2003, SELMAS.

[28]  Richard T. Vaughan,et al.  The Player/Stage Project: Tools for Multi-Robot and Distributed Sensor Systems , 2003 .

[29]  Laxmikant V. Kalé,et al.  BigSim: a parallel simulator for performance prediction of extremely large parallel machines , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[30]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[31]  Kyriakos Mouratidis,et al.  Conceptual partitioning: an efficient method for continuous nearest neighbor monitoring , 2005, SIGMOD '05.

[32]  I. Couzin,et al.  Effective leadership and decision-making in animal groups on the move , 2005, Nature.

[33]  Christoph Koch,et al.  On the complexity of nonrecursive XQuery and functional query languages on complex values , 2006, TODS.

[34]  Sean Luke,et al.  MASON: A Multiagent Simulation Environment , 2005, Simul..

[35]  Rob Pike,et al.  Interpreting the data: Parallel analysis with Sawzall , 2005, Sci. Program..

[36]  Jens Teubner Pathfinder: XQuery Compilation Techniques for Relational Database Targets , 2007, BTW.

[37]  John T. Daly,et al.  A higher order estimate of the optimum checkpoint interval for restart dumps , 2006, Future Gener. Comput. Syst..

[38]  Joseph J. Hale,et al.  From Disorder to Order in Marching Locusts , 2006, Science.

[39]  Eileen Kraemer,et al.  SASSY: A Design for a Scalable Agent-Based Simulation System using a Distributed Discrete Event Infrastructure , 2006, Proceedings of the 2006 Winter Simulation Conference.

[40]  D. Grünbaum Behavior. Align in the sand. , 2006, Science.

[41]  D. Grünbaum Align in the Sand , 2006, Science.

[42]  William Y. Arms,et al.  Building a research library for the history of the web , 2006, Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '06).

[43]  Ion Stoica,et al.  Declarative networking: language, execution and optimization , 2006, SIGMOD Conference.

[44]  Douglas Stott Parker,et al.  Map-reduce-merge: simplified relational data processing on large clusters , 2007, SIGMOD '07.

[45]  Johannes Gehrke,et al.  Scaling games to epic proportions , 2007, SIGMOD '07.

[46]  The Duy Bui,et al.  Dividing Agents on the Grid for Large Scale Simulation , 2008, PRIMA.

[47]  Johannes Gehrke,et al.  Declarative processing for computer games , 2008, Sandbox '08.

[48]  Geoffrey C. Fox,et al.  MapReduce for Data Intensive Scientific Analyses , 2008, 2008 IEEE Fourth International Conference on eScience.

[49]  Jens Dittrich,et al.  Indexing Moving Objects Using Short-Lived Throwaway Indexes , 2009, SSTD.

[50]  Zhiyong Xu,et al.  SJMR: Parallelizing spatial join with MapReduce on clusters , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.

[51]  Christoforos E. Kozyrakis,et al.  Phoenix rebirth: Scalable MapReduce on a large-scale shared-memory system , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).

[52]  Moshe Ben-Akiva,et al.  Scalability of dynamic traffic assignment , 2009 .

[53]  Vittorio Scarano,et al.  An Efficient GPU Implementation for Large Scale Individual-Based Simulation of Collective Behavior , 2009, 2009 International Workshop on High Performance Computational Systems Biology.

[54]  Joseph M. Hellerstein,et al.  Boom analytics: exploring data-centric, declarative programming for the cloud , 2010, EuroSys '10.