Experiences with Mesh-like computations using Prediction Binary Trees

In this paper we aim at exploiting the temporal coherence among successive phases of a computation, in order to implement a load-balancing technique in mesh-like computations to be mapped on a cluster of processors. A key concept, on which the load balancing schema is built on, is the use of a Predictor component that is in charge of providing an estimation of the unbalancing between successive phases. By using this information, our method partitions the computation in balanced tasks through the Prediction Binary Tree (PBT). At each new phase, current PBT is updated by using previous phase computing time for each task as next-phase's cost estimate. The PBT is designed so that it balances the load across the tasks as well as reduces dependency among processors for higher performances. Reducing dependency is obtained by using rectangular tiles of the mesh, of almost-square shape (i. e. one dimension is at most twice the other). By reducing dependency, one can reduce inter-processors communication or exploit local dependencies among tasks (such as data locality). Furthermore, we also provide two heuristics which take advantage of data-locality. Our strategy has been assessed on a significant problem, Parallel Ray Tracing. Our implementation shows a good scalability, and improves performance in both cheaper commodity cluster and high performance clusters with low latency networks. We report different measurements showing that tasks granularity is a key point for the performances of our decomposition/mapping strategy.

[1]  Gennaro Cordasco,et al.  Load Balancing in Mesh-like Computations using Prediction Binary Trees , 2008, 2008 International Symposium on Parallel and Distributed Computing.

[2]  Cristina Boeres,et al.  Cluster-based static scheduling: theory and practice , 2002, 14th Symposium on Computer Architecture and High Performance Computing, 2002. Proceedings..

[3]  S.G. Parker,et al.  Design for Parallel Interactive Ray Tracing Systems , 2006, 2006 IEEE Symposium on Interactive Ray Tracing.

[4]  Tao Yang,et al.  DSC: Scheduling Parallel Tasks on an Unbounded Number of Processors , 1994, IEEE Trans. Parallel Distributed Syst..

[5]  Joel H. Saltz,et al.  Dynamic Remapping of Parallel Computations with Varying Resource Demands , 1988, IEEE Trans. Computers.

[6]  Bin Cong,et al.  Scalable Parallel Computing: Technology, Architecture, Programming , 1999, Parallel Distributed Comput. Pract..

[7]  Ishfaq Ahmad,et al.  Benchmarking and Comparison of the Task Graph Scheduling Algorithms , 1999, J. Parallel Distributed Comput..

[8]  Leslie Ann Goldberg,et al.  The natural work-stealing algorithm is stable , 2001, Proceedings 2001 IEEE International Conference on Cluster Computing.

[9]  Steven G. Parker,et al.  Memory sharing for interactive ray tracing on clusters , 2005, Parallel Comput..

[10]  James C. Browne,et al.  On partitioning dynamic adaptive grid hierarchies , 1996, Proceedings of HICSS-29: 29th Hawaii International Conference on System Sciences.

[11]  Marcin Paprzycki,et al.  Parallel computing works! , 1996, IEEE Parallel & Distributed Technology: Systems & Applications.

[12]  NagleJohn Congestion control in IP/TCP internetworks , 1984 .

[13]  Lingyun Yang,et al.  Conservative Scheduling: Using Predicted Variance to Improve Scheduling Decisions in Dynamic Environments , 2003, ACM/IEEE SC 2003 Conference (SC'03).

[14]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .

[15]  Kevin Barraclough,et al.  I and i , 2001, BMJ : British Medical Journal.

[16]  I. Wald,et al.  On building fast kd-Trees for Ray Tracing, and on doing that in O(N log N) , 2006, 2006 IEEE Symposium on Interactive Ray Tracing.

[17]  James C. Browne,et al.  Distributed Dynamic Data-Structures for Parallel Adaptive Mesh-Refinement , 1995 .

[18]  Peter Shirley,et al.  Realistic ray tracing , 2000 .

[19]  Behrooz Shirazi,et al.  Mapping of parallel tasks to multiprocessors with duplication , 1998, Proceedings of the Thirty-First Hawaii International Conference on System Sciences.

[20]  Ronald L. Rivest,et al.  Introduction to Algorithms, Second Edition , 2001 .

[21]  Thomas W. Calvert,et al.  Exploiting temporal coherence in ray tracing , 1990 .

[22]  Ingo Wald,et al.  Realtime ray tracing and interactive global illumination , 2004, Ausgezeichnete Informatikdissertationen.

[23]  Robert D. Blumofe,et al.  Scheduling multithreaded computations by work stealing , 1994, Proceedings 35th Annual Symposium on Foundations of Computer Science.

[24]  Turner Whitted,et al.  An improved illumination model for shaded display , 1979, CACM.

[25]  Erik Reinhard,et al.  Practical Parallel Rendering , 2002, Practical Parallel Rendering.