Term Rewriting on GPUs

We present a way to implement term rewriting on a GPU. We do this by letting the GPU repeatedly perform a massively parallel evaluation of all subterms. We find that if the term rewrite systems exhibit sufficient internal parallelism, GPU rewriting substantially outperforms the CPU. Since we expect that our implementation can be further optimized, and because in any case GPUs will become much more powerful in the future, this suggests that GPUs are an interesting platform for term rewriting. As term rewriting can be viewed as a universal programming language, this also opens a route towards programming GPUs by term rewriting, especially for irregular computations.

[1]  John L. Gustafson,et al.  Reevaluating Amdahl's law , 1988, CACM.

[2]  Jason Maassen,et al.  The Landscape of Exascale Research , 2020, ACM Comput. Surv..

[3]  Willem P. A. Ligtenberg,et al.  Efficient reconstruction of biological networks via transitive reduction on general purpose graphics processors , 2012, BMC Bioinformatics.

[4]  Francisco Durán,et al.  The Rewrite Engines Competitions: A RECtrospective , 2019, TACAS.

[5]  Henri E. Bal,et al.  Stepwise‐refinement for performance: a methodology for many‐core programming , 2015, Concurr. Comput. Pract. Exp..

[6]  Michael Garland,et al.  Work-Efficient Parallel GPU Methods for Single-Source Shortest Paths , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.

[7]  Gordana Milosavljevic,et al.  TextX: A Python tool for Domain-Specific Languages implementation , 2017, Knowl. Based Syst..

[8]  Simon L. Peyton Jones,et al.  GRIP - A high-performance architecture for parallel graph reduction , 1987, FPCA.

[9]  Terese Term rewriting systems , 2003, Cambridge tracts in theoretical computer science.

[10]  Gérard Huet,et al.  On the Uniform Halting Problem for Term Rewriting Systems , 1978 .

[11]  Erik P. de Vink,et al.  The mCRL2 Toolset for Analysing Concurrent Systems - Improvements in Expressivity and Usability , 2019, TACAS.

[12]  A. Belloum,et al.  The Landscape of Exascale Research: A Data-Driven Literature Analysis Heldens, , 2020 .

[13]  Willem G. Vree,et al.  The Dutch parallel reduction machine project , 1987, Future Gener. Comput. Syst..

[14]  Max Grossman,et al.  Professional CUDA C Programming , 2014 .

[15]  Trevor L. McDonell Optimising purely functional GPU programs , 2013, ICFP.

[16]  Martin Elsman,et al.  Modular acceleration: tricky cases of functional high-performance computing , 2018, FHPC@ICFP.

[17]  Dragan Bosnacki,et al.  GPUexplore 2.0: Unleashing GPU Explicit-State Model Checking , 2016, FM.

[18]  G. Amdhal,et al.  Validity of the single processor approach to achieving large scale computing capabilities , 1967, AFIPS '67 (Spring).

[19]  Pradeep Dubey,et al.  Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU , 2010, ISCA.

[20]  Keshav Pingali,et al.  Data-Driven Versus Topology-driven Irregular Computations on GPUs , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.