Throughput-Driven Partitioning of Stream Programs on Heterogeneous Distributed Systems

Graph partitioning is an important problem in computer science and is of NP-hard complexity. In practice it is usually solved using heuristics. In this article we introduce the use of graph partitioning to partition the workload of stream programs to optimise the throughput on heterogeneous distributed platforms. Existing graph partitioning heuristics are not adequate for this problem domain. In this article we present two new heuristics to capture the problem space of graph partitioning for stream programs to optimise throughput. The first algorithm is an adaptation of the well-known Kernighan-Lin algorithm, called KL-Adapted (KLA), which is relatively slow. As a second algorithm we have developed the Congestion Avoidance (CA) partitioning algorithm, which performs reconfiguration moves optimised to our problem type. We compare both KLA and CA with the generic meta-heuristic Simulated Annealing (SA). All three methods achieve similar throughput results for most cases, but with significant differences in calculation time. For small graphs KLA is faster than SA, but KLA is slower for larger graphs. CA on the other hand is always orders of magnitudes faster than both KLA and SA, even for large graphs. This makes CA potentially useful for re-partitioning of systems during runtime.

[1]  Ümit V. Çatalyürek,et al.  Hypergraph-Partitioning-Based Decomposition for Parallel Sparse-Matrix Vector Multiplication , 1999, IEEE Trans. Parallel Distributed Syst..

[2]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[3]  Bruce Hendrickson,et al.  A Multi-Level Algorithm For Partitioning Graphs , 1995, Proceedings of the IEEE/ACM SC95 Conference.

[4]  David S. Johnson,et al.  Some Simplified NP-Complete Graph Problems , 1976, Theor. Comput. Sci..

[5]  Henry Hoffmann,et al.  A stream compiler for communication-exposed architectures , 2002, ASPLOS X.

[6]  Peter Sanders,et al.  High quality graph partitioning , 2012, Graph Partitioning and Graph Clustering.

[7]  Bradford L. Chamberlain,et al.  Graph Partitioning Algorithms for Distributing Workloads of Parallel Computations , 2001 .

[8]  George Karypis,et al.  Multilevel k-way Partitioning Scheme for Irregular Graphs , 1998, J. Parallel Distributed Comput..

[9]  Michael I. Gordon,et al.  Exploiting coarse-grained task, data, and pipeline parallelism in stream programs , 2006, ASPLOS XII.

[10]  Paul M. Carpenter,et al.  Mapping stream programs onto heterogeneous multiprocessor systems , 2009, CASES '09.

[11]  E.A. Lee,et al.  Synchronous data flow , 1987, Proceedings of the IEEE.

[12]  Ralf Diekmann,et al.  Shape-optimized mesh partitioning and load balancing for parallel adaptive FEM , 2000, Parallel Comput..

[13]  Alex Pothen,et al.  Graph Partitioning Algorithms with Applications to Scientific Computing , 1997 .

[14]  Alex Pothen,et al.  PARTITIONING SPARSE MATRICES WITH EIGENVECTORS OF GRAPHS* , 1990 .

[15]  Avinash Malik,et al.  Executing synchronous data flow graphs on heterogeneous execution architectures using integer linear programming , 2011 .

[16]  Satish Rao,et al.  Graph partitioning using single commodity flows , 2006, STOC '06.

[17]  Pat Hanrahan,et al.  Brook for GPUs: stream computing on graphics hardware , 2004, ACM Trans. Graph..

[18]  Alexander V. Shafarenko,et al.  A Gentle Introduction to S-Net: Typed Stream Processing and Declarative Coordination of Asynchronous Components , 2008, Parallel Process. Lett..

[19]  Kostas Katrinis,et al.  Generating synthetic task graphs for simulating stream computing systems , 2013, J. Parallel Distributed Comput..

[20]  David Gregg,et al.  Orchestrating stream graphs using model checking , 2013, ACM Trans. Archit. Code Optim..

[21]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[22]  Ralf Diekmann,et al.  Aspect Radio for Mesh Partitioning , 1998, Euro-Par.

[23]  Andrew B. Kahng,et al.  Spectral Partitioning with Multiple Eigenvectors , 1999, Discret. Appl. Math..

[24]  Alexander V. Shafarenko,et al.  Parallel signal processing with S-Net , 2010, ICCS.

[25]  Satish Rao,et al.  Expander flows, geometric embeddings and graph partitioning , 2004, STOC '04.

[26]  Burkhard Monien,et al.  Graph partitioning with the Party library: helpful-sets in practice , 2004, 16th Symposium on Computer Architecture and High Performance Computing.

[27]  Brian W. Kernighan,et al.  An efficient heuristic procedure for partitioning graphs , 1970, Bell Syst. Tech. J..

[28]  Andrew B. Kahng,et al.  New spectral methods for ratio cut partitioning and clustering , 1991, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[29]  Chris Walshaw,et al.  Mesh Partitioning: A Multilevel Balancing and Refinement Algorithm , 2000, SIAM J. Sci. Comput..

[30]  Wang Yi,et al.  UPPAAL - Now, Next, and Future , 2000, MOVEP.

[31]  Raimund Kirner,et al.  A Multi-level Monitoring Framework for Stream-Based Coordination Programs , 2012, ICA3PP.

[32]  Edward A. Lee,et al.  Static Scheduling of Synchronous Data Flow Programs for Digital Signal Processing , 1989, IEEE Transactions on Computers.

[33]  Denis Barthou,et al.  Automatic Mapping of Stream Programs on Multicore Architectures , 2013 .

[34]  Eric E. Aubanel,et al.  Partitioning and mapping of mesh-based applications onto computational grids , 2004, Fifth IEEE/ACM International Workshop on Grid Computing.

[35]  Edward A. Lee,et al.  Declustering: A New Multiprocessor Scheduling Technique , 1993, IEEE Trans. Parallel Distributed Syst..

[36]  Alfred V. Aho,et al.  Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.

[37]  Wei Chen,et al.  Task Partitioning and Mapping Algorithms for Multi-core Packet Processing Systems , 2009 .

[38]  Jean Roman,et al.  SCOTCH: A Software Package for Static Mapping by Dual Recursive Bipartitioning of Process and Architecture Graphs , 1996, HPCN Europe.

[39]  Raimund Kirner,et al.  Demand-Based Scheduling Priorities for Performance Optimisation of Stream Programs on Parallel Platforms , 2013, ICA3PP.

[40]  R. M. Mattheyses,et al.  A Linear-Time Heuristic for Improving Network Partitions , 1982, 19th Design Automation Conference.

[41]  Cecilia R. Aragon,et al.  Optimization by Simulated Annealing: An Experimental Evaluation; Part II, Graph Coloring and Number Partitioning , 1991, Oper. Res..

[42]  Brian W. Kernighan,et al.  A proper model for the partitioning of electrical circuits , 1972, DAC '72.

[43]  William Thies,et al.  StreamIt: A Language for Streaming Applications , 2002, CC.

[44]  Henning Meyerhenke,et al.  Graph partitioning and disturbed diffusion , 2009, Parallel Comput..