Accelerating Distributed Graphical Fluid Simulations with Micro‐partitioning

Graphical fluid simulations are CPU‐bound. Parallelizing simulations on hundreds of cores in the computing cloud would make them faster, but requires evenly balancing load across nodes. Good load balancing depends on manual decisions from experts, which are time‐consuming and error prone, or dynamic approaches that estimate and react to future load, which are non‐deterministic and hard to debug.

[1]  Kai Hwang,et al.  Correction to “optimal load balancing in a multiple processor system with many job classes” , 1985, IEEE Transactions on Software Engineering.

[2]  Amin Vahdat,et al.  A scalable, commodity data center network architecture , 2008, SIGCOMM '08.

[3]  Ronald Fedkiw,et al.  PhysBAM: physically based simulation , 2011, SIGGRAPH '11.

[4]  Vipin Kumar,et al.  Parallel Multilevel k-way Partitioning Scheme for Irregular Graphs , 1996, Proceedings of the 1996 ACM/IEEE Conference on Supercomputing.

[5]  Kui Wu,et al.  Fast Fluid Simulations with Sparse Volumes on the GPU , 2018, Comput. Graph. Forum.

[6]  Alexander Aiken,et al.  Legion: Expressing locality and independence with logical regions , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[7]  Francis C. M. Lau,et al.  Decentralized Remapping of Data Parallel Applications in Distributed Memory Multiprocessors , 1997, Concurr. Pract. Exp..

[8]  Ian M. Mitchell,et al.  A hybrid particle level set method for improved interface capturing , 2002 .

[9]  Ümit V. Çatalyürek,et al.  Hypergraph-based Dynamic Load Balancing for Adaptive Scientific Computations , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[10]  Haitao Wu,et al.  BCube: a high performance, server-centric network architecture for modular data centers , 2009, SIGCOMM '09.

[11]  Dan Bailey,et al.  Distributing liquids using OpenVDB , 2015, SIGGRAPH Talks.

[12]  Abhishek Gupta,et al.  Parallel Programming with Migratable Objects: Charm++ in Practice , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.

[13]  Philip Levis,et al.  Distributing and Load Balancing Sparse Fluid Simulations , 2018, Comput. Graph. Forum.

[14]  Albert G. Greenberg,et al.  VL2: a scalable and flexible data center network , 2009, SIGCOMM '09.

[15]  Saralees Nadarajah,et al.  Asymptotics of Maxima of Discrete Random Variables , 2002 .

[16]  Eftychios Sifakis,et al.  SPGrid: a sparse paged grid structure applied to adaptive smoke simulation , 2014, ACM Trans. Graph..

[17]  Eftychios Sifakis,et al.  A scalable schur-complement fluids solver for heterogeneous compute platforms , 2016, ACM Trans. Graph..

[18]  Laxmikant V. Kalé,et al.  A distributed dynamic load balancer for iterative applications , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[19]  Ronald Fedkiw,et al.  Chimera grids for water simulation , 2013, SCA '13.

[20]  Eric G. Manning,et al.  A framework for distributed debugging , 1990, IEEE Software.

[21]  Andre Pradhana,et al.  GPU optimization of material point methods , 2018, ACM Trans. Graph..

[22]  Jos Stam,et al.  Stable fluids , 1999, SIGGRAPH.

[23]  Ronald Fedkiw,et al.  Visual simulation of smoke , 2001, SIGGRAPH.

[24]  Philip Levis,et al.  Decoupling the control plane from program control flow for flexibility and performance in cloud computing , 2018, EuroSys.

[25]  Michael Hall,et al.  Simulating rivers in the good dinosaur , 2016, SIGGRAPH Talks.

[26]  Hong Liu,et al.  Jupiter Rising: A Decade of Clos Topologies and Centralized Control in Google's Datacenter Network , 2015, Comput. Commun. Rev..

[27]  Philip Levis,et al.  Automatically Distributing Eulerian and Hybrid Fluid Simulations in the Cloud , 2018, ACM Trans. Graph..

[28]  Ken Museth,et al.  VDB: High-resolution sparse volumes with dynamic topology , 2013, TOGS.

[29]  Laxmikant V. Kalé,et al.  Periodic hierarchical load balancing for large supercomputers , 2011, Int. J. High Perform. Comput. Appl..

[30]  Mathieu Desbrun,et al.  Smoothed particles: a new paradigm for animating highly deformable bodies , 1996 .

[31]  Hector Garcia-Molina,et al.  Debugging a Distributed Computing System , 1984, IEEE Transactions on Software Engineering.

[32]  Patrick Wendell,et al.  Sparrow: distributed, low latency scheduling , 2013, SOSP.

[33]  Hartmut Kaiser,et al.  HPX: A Task Based Programming Model in a Global Address Space , 2014, PGAS.

[34]  Francis H Harlow,et al.  The particle-in-cell method for numerical solution of problems in fluid dynamics , 1962 .

[35]  J. Monaghan,et al.  Smoothed particle hydrodynamics: Theory and application to non-spherical stars , 1977 .

[36]  Rajeev Thakur,et al.  Optimization of Collective Communication Operations in MPICH , 2005, Int. J. High Perform. Comput. Appl..

[37]  Mahidhar Tatineni,et al.  SR-IOV: Performance Benefits for Virtualized Interconnects , 2014, XSEDE '14.

[38]  Albert G. Greenberg,et al.  Reining in the Outliers in Map-Reduce Clusters using Mantri , 2010, OSDI.

[39]  Christian Bienia,et al.  Benchmarking modern multiprocessors , 2011 .

[40]  G. Karypis,et al.  Multi-Constraint Mesh Partitioning for Contact/Impact Computations , 2003, ACM/IEEE SC 2003 Conference (SC'03).

[41]  F. Harlow,et al.  Numerical Calculation of Time‐Dependent Viscous Incompressible Flow of Fluid with Free Surface , 1965 .

[42]  Scott Shenker,et al.  The Case for Tiny Tasks in Compute Clusters , 2013, HotOS.

[43]  Larry Kaplan,et al.  The Gemini System Interconnect , 2010, 2010 18th IEEE Symposium on High Performance Interconnects.

[44]  Laxmikant V. Kalé,et al.  Optimizing Data Locality for Fork/Join Programs Using Constrained Work Stealing , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.

[45]  Francis C. M. Lau,et al.  Decentralized remapping of data parallel applications in distributed memory multiprocessors , 1997 .

[46]  Chenfanfu Jiang,et al.  The affine particle-in-cell method , 2015, ACM Trans. Graph..

[47]  Jieyu Chu,et al.  A schur complement preconditioner for scalable parallel fluid simulation , 2017, TOGS.

[48]  Qingyu Meng,et al.  The uintah framework: a unified heterogeneous task scheduling and runtime system , 2012, 2012 SC Companion: High Performance Computing, Networking Storage and Analysis.

[49]  Eunyoung Jeong,et al.  mTCP: a Highly Scalable User-level TCP Stack for Multicore Systems , 2014, NSDI.

[50]  Matteo Frigo,et al.  The implementation of the Cilk-5 multithreaded language , 1998, PLDI.

[51]  Emmanuel Jeannot,et al.  Communication and topology-aware load balancing in Charm++ with TreeMatch , 2013, 2013 IEEE International Conference on Cluster Computing (CLUSTER).