Automatically Distributing Eulerian and Hybrid Fluid Simulations in the Cloud

Distributing a simulation across many machines can drastically speed up computations and increase detail. The computing cloud provides tremendous computing resources, but weak service guarantees force programs to manage significant system complexity: nodes, networks, and storage occasionally perform poorly or fail. We describe Nimbus, a system that automatically distributes grid-based and hybrid simulations across cloud computing nodes. The main simulation loop is sequential code and launches distributed computations across many cores. The simulation on each core runs as if it is stand-alone: Nimbus automatically stitches these simulations into a single, larger one. To do this efficiently, Nimbus introduces a four-layer data model that translates between the contiguous, geometric objects used by simulation libraries and the replicated, fine-grain objects managed by its underlying cloud computing runtime. Using PhysBAM particle-level set fluid simulations, we demonstrate that Nimbus can run higher detail simulations faster, distribute simulations on up to 512 cores, and run enormous simulations (10243 cells). Nimbus automatically manages these distributed simulations, balancing load across nodes and recovering from failures. Implementations of PhysBAM water and smoke simulations as well as an open source heat-diffusion simulation show that Nimbus is general and can support complex simulations. Nimbus can be downloaded from https://nimbus.stanford.edu.

[1]  Chau-Wen Tseng,et al.  Tiling Optimizations for 3D Scientific Computations , 2000, ACM/IEEE SC 2000 Conference (SC'00).

[2]  Michael J. Franklin,et al.  Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.

[3]  David P. Luebke,et al.  CUDA: Scalable parallel programming for high-performance scientific computing , 2008, 2008 5th IEEE International Symposium on Biomedical Imaging: From Nano to Macro.

[4]  Franck Cappello,et al.  Optimization of cloud task processing with checkpoint-restart mechanism , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[5]  Ben Ashbaugh,et al.  Khronos™ group , 2015, IWOCL.

[6]  Greg Humphreys,et al.  Chromium: a stream-processing framework for interactive rendering on clusters , 2002, SIGGRAPH.

[7]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[8]  Other Contributors Are Indicated Where They Contribute Academy of Motion Picture Arts and Sciences , 2017 .

[9]  Samuel Williams,et al.  The potential of the cell processor for scientific computing , 2005, CF '06.

[10]  Frédo Durand,et al.  Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines , 2013, PLDI 2013.

[11]  Steven G. Parker,et al.  Uintah: a massively parallel problem solving environment , 2000, Proceedings the Ninth International Symposium on High-Performance Distributed Computing.

[12]  Steven J. Deitz,et al.  Abstractions for dynamic data distribution , 2004, Ninth International Workshop on High-Level Parallel Programming Models and Supportive Environments, 2004. Proceedings..

[13]  Scott Shenker,et al.  Usenix Association 10th Usenix Symposium on Networked Systems Design and Implementation (nsdi '13) 185 Effective Straggler Mitigation: Attack of the Clones , 2022 .

[14]  Vivek Sarkar,et al.  X10: an object-oriented approach to non-uniform cluster computing , 2005, OOPSLA '05.

[15]  Michael Bauer Legion: Programming Distributed Heterogeneous Architectures with Logical Regions , 2014 .

[16]  Eric Darve,et al.  Liszt: A domain specific language for building portable mesh-based PDE solvers , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[17]  Emmanuel Jeannot,et al.  Communication and topology-aware load balancing in Charm++ with TreeMatch , 2013, 2013 IEEE International Conference on Cluster Computing (CLUSTER).

[18]  Philip Levis,et al.  Ebb: A DSL for Physical Simluation on CPUs and GPUs , 2015, ACM Trans. Graph..

[19]  Francis H Harlow,et al.  The particle-in-cell method for numerical solution of problems in fluid dynamics , 1962 .

[20]  Joseph E. Flaherty,et al.  An adaptive mesh-moving and local refinement method for time-dependent partial differential equations , 1990, TOMS.

[21]  Kashi Venkatesh Vishwanath,et al.  Characterizing cloud computing hardware reliability , 2010, SoCC '10.

[22]  Vipin Kumar,et al.  Parallel Multilevel k-way Partitioning Scheme for Irregular Graphs , 1996, Proceedings of the 1996 ACM/IEEE Conference on Supercomputing.

[23]  M. Abadi,et al.  Naiad: a timely dataflow system , 2013, SOSP.

[24]  Ian M. Mitchell,et al.  A hybrid particle level set method for improved interface capturing , 2002 .

[25]  Ümit V. Çatalyürek,et al.  Hypergraph-based Dynamic Load Balancing for Adaptive Scientific Computations , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[26]  Frank Woods,et al.  The Academy of Motion Picture Arts and Sciences , 1928 .

[27]  Jérémie Allard,et al.  A shader-based parallel rendering framework , 2005, VIS 05. IEEE Visualization, 2005..

[28]  Jack Dongarra,et al.  MPI - The Complete Reference: Volume 1, The MPI Core , 1998 .

[29]  Shoaib Kamil,et al.  Distributed Halide , 2016, PPoPP.

[30]  Ananta Tiwari,et al.  Understanding the performance of stencil computations on Intel's Xeon Phi , 2013, 2013 IEEE International Conference on Cluster Computing (CLUSTER).

[31]  Jorge-Arnulfo Quiané-Ruiz,et al.  Efficient Big Data Processing in Hadoop MapReduce , 2012, Proc. VLDB Endow..

[32]  John W. Young,et al.  A first order approximation to the optimum checkpoint interval , 1974, CACM.

[33]  Ronald Fedkiw,et al.  Simulating water and smoke with an octree data structure , 2004, ACM Trans. Graph..

[34]  Laxmikant V. Kalé,et al.  Work stealing and persistence-based load balancers for iterative overdecomposed applications , 2012, HPDC '12.

[35]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[36]  Joseph Gonzalez,et al.  PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs , 2012, OSDI.

[37]  Ronald Fedkiw,et al.  A hybrid Lagrangian-Eulerian formulation for bubble generation and dynamics , 2013, SCA '13.

[38]  Michael Hall,et al.  Simulating rivers in the good dinosaur , 2016, SIGGRAPH Talks.

[39]  Sriram Krishnamoorthy,et al.  Scalable work stealing , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[40]  Alexander Aiken,et al.  Regent: a high-productivity programming language for HPC with logical regions , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.

[41]  Scott Shenker,et al.  Making Sense of Performance in Data Analytics Frameworks , 2015, NSDI.

[42]  Ronald Fedkiw,et al.  PhysBAM: physically based simulation , 2011, SIGGRAPH '11.

[43]  Albert G. Greenberg,et al.  Reining in the Outliers in Map-Reduce Clusters using Mantri , 2010, OSDI.

[44]  Jason Duell,et al.  Berkeley Lab Checkpoint/Restart (BLCR) for Linux Clusters , 2006 .

[45]  Ronald Fedkiw,et al.  Animation and rendering of complex water surfaces , 2002, ACM Trans. Graph..

[46]  Nick McKeown,et al.  Optimal load-balancing , 2005, Proceedings IEEE 24th Annual Joint Conference of the IEEE Computer and Communications Societies..

[47]  Honglak Lee,et al.  Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations , 2009, ICML '09.

[48]  David R. Hill,et al.  OpenVDB: an open-source data structure and toolkit for high-resolution volumes , 2013, SIGGRAPH '13.

[49]  Eftychios Sifakis,et al.  Physical simulation for animation and visual effects: parallelization and characterization for chip multiprocessors , 2007, ISCA '07.

[50]  Philip Levis,et al.  Execution Templates: Caching Control Plane Decisions for Strong Scaling of Data Analytics , 2017, USENIX Annual Technical Conference.

[51]  Chenfanfu Jiang,et al.  The affine particle-in-cell method , 2015, ACM Trans. Graph..

[52]  P. Hanrahan,et al.  Sequoia: Programming the Memory Hierarchy , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[53]  Bradley C. Kuszmaul,et al.  The pochoir stencil compiler , 2011, SPAA '11.

[54]  James F. O'Brien,et al.  Self-refining games using player analytics , 2014, ACM Trans. Graph..

[55]  Robert Bridson,et al.  Animating sand as a fluid , 2005, ACM Trans. Graph..

[56]  Jos Stam,et al.  Stable fluids , 1999, SIGGRAPH.

[57]  Laxmikant V. Kalé,et al.  Periodic hierarchical load balancing for large supercomputers , 2011, Int. J. High Perform. Comput. Appl..

[58]  Marc'Aurelio Ranzato,et al.  Large Scale Distributed Deep Networks , 2012, NIPS.

[59]  Gene Cooperman,et al.  DMTCP: Transparent checkpointing for cluster computations and the desktop , 2007, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[60]  Mathieu Desbrun,et al.  Smoothed particles: a new paradigm for animating highly deformable bodies , 1996 .

[61]  Pat Hanrahan,et al.  A language for shading and lighting calculations , 1990, SIGGRAPH.

[62]  Ronald Fedkiw,et al.  Chimera grids for water simulation , 2013, SCA '13.

[63]  Laxmikant V. Kalé,et al.  CHARM++: a portable concurrent object oriented system based on C++ , 1993, OOPSLA '93.

[64]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[65]  Wojciech Matusik,et al.  Simit , 2016, ACM Trans. Graph..

[66]  Patrick Wendell,et al.  Sparrow: distributed, low latency scheduling , 2013, SOSP.