Parallel cache-efficient code for computing the McCaskill partition functions

We present parallel tiled optimized McCaskill’s partition functions computation code. That CPU and memory intensive dynamic programming task is within computational biology. To optimize code, we use the authorial source-to-source TRACO compiler and compare obtained code performance to that generated with the state-of-the-art PluTo compiler based on the affine transformations framework (ATF). Although PLuTo generates tiled code with outstanding locality, it fails to parallelize tiled code. A TRACO tiling strategy uses the transitive closure of a dependence graph to avoid affine function calculation. The ISL scheduler is used to parallelize tiled loop nests. An experimental study carried out on a multi-core computer demonstrates considerable speed-up of generated code for the larger number of threads.

[1]  Sartaj Sahni,et al.  Multicore and GPU algorithms for Nussinov RNA folding , 2013, BMC Bioinformatics.

[2]  Marek Palkowski,et al.  Parallel tiled Nussinov RNA folding loop nest generated using both dependence graph transitive closure and loop skewing , 2017, BMC Bioinformatics.

[3]  Uday Bondhugula,et al.  Diamond Tiling: Tiling Techniques to Maximize Parallelism for Stencil Computations , 2017, IEEE Transactions on Parallel and Distributed Systems.

[4]  Marek Palkowski,et al.  Tiling arbitrarily nested loops by means of the transitive , 2016, Int. J. Appl. Math. Comput. Sci..

[5]  Rolf Backofen,et al.  Freiburg RNA tools: a central online resource for RNA-focused research and teaching , 2018, Nucleic Acids Res..

[6]  Michael Wolfe,et al.  Loops skewing: The wavefront method revisited , 1986, International Journal of Parallel Programming.

[7]  Marek Palkowski,et al.  Tuning iteration space slicing based tiled multi-core code implementing Nussinov’s RNA folding , 2018, BMC Bioinformatics.

[8]  Uday Bondhugula,et al.  The Pluto+ Algorithm , 2016, ACM Trans. Program. Lang. Syst..

[9]  Wlodzimierz Bielecki,et al.  Using basis dependence distance vectors in the modified Floyd–Warshall algorithm , 2015, J. Comb. Optim..

[10]  Peter F. Stadler,et al.  Prediction of RNA Base Pairing Probabilities on Massively Parallel Computers , 2000, J. Comput. Biol..

[11]  William Pugh,et al.  An Exact Method for Analysis of Value-based Array Data Dependences , 1993, LCPC.

[12]  Paul Feautrier,et al.  Some efficient solutions to the affine scheduling problem. Part II. Multidimensional time , 1992, International Journal of Parallel Programming.

[13]  Sartaj Sahni,et al.  Cache and energy efficient algorithms for Nussinov’s RNA Folding , 2017, BMC Bioinformatics.

[14]  Uday Bondhugula,et al.  A practical automatic polyhedral parallelizer and locality optimizer , 2008, PLDI '08.

[15]  Paul Feautrier,et al.  Some efficient solutions to the affine scheduling problem. I. One-dimensional time , 1992, International Journal of Parallel Programming.

[16]  David Wonnacott,et al.  Automatic Tiling of “ Mostly-Tileable ” Loop Nests , 2014 .

[17]  Uday Bondhugula,et al.  Tiling for Dynamic Scheduling , 2014 .

[18]  J. McCaskill The equilibrium partition function and base pair binding probabilities for RNA secondary structure , 1990, Biopolymers.