论文信息 - Easy PRAM-Based High-Performance Parallel Programming with ICE

Easy PRAM-Based High-Performance Parallel Programming with ICE

Parallel machines have become more widely used. Unfortunately parallel programming technologies have advanced at a much slower pace except for regular programs. For irregular programs, this advancement is inhibited by high synchronization costs, non-loop parallelism, non-array data structures, recursively expressed parallelism and parallelism that is too fine-grained to be exploitable. We present ICE, a new parallel programming language that is easy-to-program, since: (i) ICE is a synchronous, lock-step language so there is no need for programmer-specified synchronization; (ii) for a PRAM algorithm its ICE program amounts to directly transcribing it; and (iii) the PRAM algorithmic theory offers unique wealth of parallel algorithms and techniques. We propose ICE to be a part of an ecosystem consisting of the XMT architecture, the PRAM algorithmic model, and ICE itself, that together deliver on the twin goal of easy programming and efficient parallelization of irregular programs. The XMT architecture, developed at UMD, can exploit fine-grained parallelism in irregular programs. We have built the ICE compiler which translates the ICE language into the multithreaded XMTC language; the significance of this is that multi-threading is a feature shared by practically all current scalable parallel programming languages thus providing a method to compile ICE code. As one indication of ease of programming, we observed a reduction in code size in 11 out of 16 benchmarks as compared to hand-optimized XMTC. For these programs, the average reduction in number of lines of code was 35.5 percent. The remaining 5 benchmarks had almost the same code size for both ICE and hand-optimized XMTC. Our main result is perhaps surprising: The run-time was comparable to XMTC with a 0.53 percent average gain for ICE across all benchmarks.

[1] Ralph Grishman,et al. The NYU Ultracomputer—designing a MIMD, shared-memory parallel machine (Extended Abstract) , 1982, ISCA '82.

[2] W. Daniel Hillis,et al. Data parallel algorithms , 1986, CACM.

[3] K. Mani Chandy,et al. Parallel program design - a foundation , 1988 .

[4] Keshav Pingali,et al. I-structures: data structures for parallel computing , 1986, Graph Reduction.

[5] David C. Cann,et al. A Report on the Sisal Language Project , 1990, J. Parallel Distributed Comput..

[6] Guy E. Blelloch,et al. Vector Models for Data-Parallel Computing , 1990 .

[7] John H. Reif,et al. Prototyping parallel and distributed programs in Proteus , 1991, Proceedings of the Third IEEE Symposium on Parallel and Distributed Processing.

[8] Roy Dz-Ching Ju,et al. Execution of automatically parallelized APL programs on RP3 , 1991, IBM J. Res. Dev..

[9] Ramesh Subramonian,et al. LogP: towards a realistic model of parallel computation , 1993, PPOPP '93.

[10] Guy E. Blelloch,et al. Programming parallel algorithms , 1996, CACM.

[11] Guy E. Blelloch,et al. A provable time and space efficient implementation of NESL , 1996, ICFP '96.

[12] S. Sitharama Iyengar,et al. Introduction to parallel algorithms , 1998, Wiley series on parallel and distributed computing.

[13] Uzi Vishkin,et al. Explicit multi-threading (XMT) bridging models for instruction parallelism (extended abstract) , 1998, SPAA '98.

[14] Uzi Vishkin,et al. Towards a First Vertical Prototyping of an Extremely Fine-Grained Parallel Programming Approach , 2003, Theory of Computing Systems.

[15] Uzi Vishkin,et al. Case study of gate-level logic simulation on an extremely fine-grained chip multiprocessor , 2006, J. Embed. Comput..

[16] Uzi Vishkin,et al. Programmer's Manual for XMTC Language, XMTC Compiler and XMT Simulator , 2006 .

[17] George C. Caragea,et al. Models for Advancing PRAM and Other Algorithms into Parallel Programs for a PRAM-On-Chip Platform , 2006, Handbook of Parallel Computing.

[18] Uzi Vishkin,et al. Fpga-based prototype of a pram-on-chip processor , 2008, CF '08.

[19] Dave Bergeron,et al. More than Moore , 2008, CICC.

[20] Uzi Vishkin,et al. Thinking in Parallel: Some Basic Data-Parallel Algorithms and Techniques , 2008 .

[21] Matteo Frigo,et al. Reducers and other Cilk++ hyperobjects , 2009, SPAA '09.

[22] Uzi Vishkin,et al. Using Simple Abstraction to Guide the Reinvention of Computing for Parallelism , 2009 .

[23] Zhengyu He,et al. Dynamically tuned push-relabel algorithm for the maximum flow problem on CPU-GPU-Hybrid platforms , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[24] Alexandros Tzannes,et al. Lazy binary-splitting: a run-time adaptive work-stealing scheduler , 2010, PPoPP '10.

[25] Fuat Keceli,et al. Resource-Aware Compiler Prefetching for Many-Cores , 2010, 2010 Ninth International Symposium on Parallel and Distributed Computing.

[26] George C. Caragea,et al. Brief announcement: better speedups for parallel max-flow , 2011, SPAA '11.

[27] Fuat Keceli,et al. Toolchain for Programming, Simulating and Studying the XMT Many-Core Architecture , 2011, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum.

[28] Uzi Vishkin,et al. Using simple abstraction to reinvent computing for parallelism , 2011, Commun. ACM.

[29] Uzi Vishkin,et al. Better speedups using simpler parallel programming for graph connectivity and biconnectivity , 2012, PMAM '12.

[30] Uzi Vishkin,et al. Truly parallel burrows-wheeler compression and decompression , 2013, SPAA.

[31] Uzi Vishkin,et al. Parallel algorithms for Burrows-Wheeler compression and decompression , 2014, Theor. Comput. Sci..

[32] Xin-She Yang,et al. Introduction to Algorithms , 2021, Nature-Inspired Optimization Algorithms.

[33] Rajeev Barua,et al. POSTER: Easy PRAM-based High-Performance Parallel Programming with ICE , 2018, PACT.

[34] O. P. Kurganova,et al. Dr. , 2019, D37. TOPICS IN GLOBAL HEALTH SERVICES RESEARCH.

[35] A. B. Saybasili. HIGHLY PARALLEL MULTI-DIMENSIONAL FAST FOURIER TRANSFORM ON FINE-AND COARSE-GRAINED MANY-CORE APPROACHES , 2022 .