Leveraging Hierarchical Data Locality in Parallel Programming Models

We are proposing a novel framework that ameliorates locality-aware parallel programming models, by defining hierarchical data locality model extension. We also propose a hierarchical thread partitioning algorithm. This algorithm synthesizes hierarchical thread placement layouts that targets minimizing the program's overall communication costs. We demonstrated the effectiveness of our approach using NAS Parallel Benchmarks implemented in Unified Parallel C (UPC) language using a modified Berkeley UPC Compiler and runtime system. We demonstrated an up to 85% improvement in performance by applying the placement layout suggested by our algorithm.