Asynchronous and Load-Balanced Union-Find for Distributed and Parallel Scientific Data Visualization and Analysis

We present a novel distributed union-find algorithm that features asynchronous parallelism and k-d tree based load balancing for scalable visualization and analysis of scientific data. Applications of union-find include level set extraction and critical point tracking, but distributed union-find can suffer from high synchronization costs and imbalanced workloads across parallel processes. In this study, we prove that global synchronizations in existing distributed union-find can be eliminated without changing final results, allowing overlapped communications and computations for scalable processing. We also use a k-d tree decomposition to redistribute inputs, in order to improve workload balancing. We benchmark the scalability of our algorithm with up to 1,024 processes using both synthetic and application data. We demonstrate the use of our algorithm in critical point tracking and super-level set extraction with high-speed imaging experiments and fusion plasma simulations, respectively.

[1]  Han-Wei Shen,et al.  Microparticle cloud imaging and tracking for data-driven plasma science , 2020 .

[2]  Zhehui Wang,et al.  Four-dimensional (4D) tracking of high-temperature microparticles. , 2016, The Review of scientific instruments.

[3]  Md. Mostofa Ali Patwary,et al.  A Scalable Parallel Union-Find Algorithm for Distributed Memory Computers , 2009, PPAM.

[4]  Jack Snoeyink,et al.  Computing contour trees in all dimensions , 2000, SODA '00.

[5]  Keshav Pingali,et al.  A lightweight infrastructure for graph analytics , 2013, SOSP.

[6]  Richard J. Anderson,et al.  Wait-free parallel algorithms for the union-find problem , 1991, STOC '91.

[7]  Gunther H. Weber,et al.  Distributed merge trees , 2013, PPoPP '13.

[8]  Xavier Tricoche,et al.  Tracking of vector field singularities in unstructured 3D time-dependent datasets , 2004, IEEE Visualization 2004.

[9]  Franck Cappello,et al.  Online data analysis and reduction: An important Co-design motif for extreme-scale computers , 2021, The international journal of high performance computing applications.

[10]  Uzi Vishkin,et al.  An O(log n) Parallel Connectivity Algorithm , 1982, J. Algorithms.

[11]  Gunther H. Weber,et al.  Distributed Contour Trees , 2014, Topological Methods in Data Analysis and Visualization.

[12]  Dmitriy Morozov,et al.  Efficient Delaunay Tessellation through K-D Tree Decomposition , 2016, SC16: International Conference for High Performance Computing, Networking, Storage and Analysis.

[13]  Han-Wei Shen,et al.  Volume tracking using higher dimensional isosurfacing , 2003, IEEE Visualization, 2003. VIS 2003..

[14]  Guy E. Blelloch,et al.  Ligra: a lightweight graph processing framework for shared memory , 2013, PPoPP '13.

[15]  Ricardo Vinuesa,et al.  Distributed Percolation Analysis for Turbulent Flows , 2019, 2019 IEEE 9th Symposium on Large Data Analysis and Visualization (LDAV).

[16]  D. Russell,et al.  Blob dynamics in 3D BOUT simulations of tokamak edge turbulence. , 2004, Physical review letters.

[17]  Zvi Galil,et al.  Data structures and algorithms for disjoint set union problems , 1991, CSUR.

[18]  Hans Hagen,et al.  Topology tracking for the visualization of time-dependent two-dimensional flows , 2002, Comput. Graph..

[19]  Jan Prins,et al.  Asynchronous In Situ Connected-Components Analysis for Complex Fluid flows , 2016, 2016 Second Workshop on In Situ Infrastructures for Enabling Extreme-Scale Analysis and Visualization (ISAV).

[20]  Kelly P. Gaither,et al.  A Distributed-Memory Algorithm for Connected Components Labeling of Simulation Data , 2015, Topological and Statistical Methods for Complex Data, Tackling Large-Scale, High-Dimensional, and Multivariate Data Spaces.

[21]  Sergei Krasheninnikov,et al.  On scrape off layer plasma transport , 2001 .

[22]  George Cybenko,et al.  Practical parallel Union-Find algorithms for transitive closure and clustering , 1989, International Journal of Parallel Programming.

[23]  P. B. Snyder,et al.  BOUT++: A framework for parallel plasma fluid simulations , 2008, Comput. Phys. Commun..

[24]  Dmitriy Morozov,et al.  Local-global merge tree computation with local exchanges , 2019, SC.

[25]  Michael J. Fischer,et al.  An improved equivalence algorithm , 1964, CACM.

[26]  Ian Foster,et al.  FTK: A High-Dimensional Simplicial Meshing Framework for Robust and Scalable Feature Tracking , 2020, ArXiv.

[27]  J. van Leeuwen,et al.  Alternative path compression techniques , 1977 .

[28]  Keshav Pingali,et al.  Gluon-Async: A Bulk-Asynchronous System for Distributed and Heterogeneous Graph Analytics , 2019, 2019 28th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[29]  Xiao Zhao,et al.  The connected-component labeling problem: A review of state-of-the-art algorithms , 2017, Pattern Recognit..

[30]  Jan van Leeuwen,et al.  Worst-case Analysis of Set Union Algorithms , 1984, JACM.

[31]  Torsten Hoefler,et al.  Using Advanced MPI: Modern Features of the Message-Passing Interface , 2014 .

[32]  G. Ciraolo,et al.  3D structure and dynamics of filaments in turbulence simulations of WEST diverted plasmas , 2019, Nuclear Fusion.

[33]  Valerio Pascucci,et al.  Parallel Computation of the Topology of Level Sets , 2003, Algorithmica.

[34]  Alex Brooks,et al.  Gluon: a communication-optimizing substrate for distributed heterogeneous graph analytics , 2018, PLDI.

[35]  Leslie G. Valiant,et al.  A bridging model for parallel computation , 1990, CACM.

[36]  Jeffrey D. Ullman,et al.  Set Merging Algorithms , 1973, SIAM J. Comput..

[37]  Ronald H. Cohen,et al.  Low-to-high confinement transition simulations in divertor geometry , 2000 .

[38]  Dmitriy Morozov,et al.  Block-parallel data analysis with DIY2 , 2016, 2016 IEEE 6th Symposium on Large Data Analysis and Visualization (LDAV).