Parallel Tiled Cache and Energy Efficient Code for Zuker's RNA Folding

In this paper, we consider Zuker’s RNA folding algorithm, which is a challenging dynamic programming task to optimize because it is resource intensive and has a large number of non-uniform dependences. We apply a previously published approach, proposed by us, to automatically tile and parallelize each loop in the Zuker RNA Folding loop nest, which is within the polyhedral model. First, for each loop nest statement, rectangular tiles are formed within the iteration space of the Zuker loop nest. Then, those tiles are corrected to honor all dependences exposed for the original loop nest. Correction is based on applying the exact transitive closure of a dependence graph. We implemented our approach as a part of the source-to-source TRACO compiler. We compare code performance and energy consumption with those obtained with the state-of-the-art PluTo compiler based on the affine transformation framework as well as with those generated by means of the cache-efficient manual method Transpose. Experiments were carried out on a modern multi-core processor to achieve the significant locality improvement and energy saving for generated code.

[1]  Marek Palkowski,et al.  A Practical Approach to Tiling Zuker's RNA Folding Using the Transitive Closure of Loop Dependence Graphs , 2017, ISAT.

[2]  David A. Bader,et al.  GTfold: a scalable multicore code for RNA secondary structure prediction , 2009, SAC '09.

[3]  Uday Bondhugula,et al.  Tiling for Dynamic Scheduling , 2014 .

[4]  Marek Palkowski,et al.  Accelerating Minimum Cost Polygon Triangulation Code with the TRACO Compiler , 2018, FedCSIS.

[5]  Marek Palkowski,et al.  Parallel Tiled Codes Implementing the Smith-Waterman Alignment Algorithm for Two and Three Sequences , 2018, J. Comput. Biol..

[6]  Michael Zuker,et al.  Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information , 1981, Nucleic Acids Res..

[7]  Roger D. Chamberlain,et al.  Rapid RNA Folding: Analysis and Acceleration of the Zuker Recurrence , 2010, 2010 18th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines.

[8]  François Irigoin,et al.  Supernode partitioning , 1988, POPL '88.

[9]  Marek Palkowski,et al.  Parallel tiled Nussinov RNA folding loop nest generated using both dependence graph transitive closure and loop skewing , 2017, BMC Bioinformatics.

[10]  Sartaj Sahni,et al.  Multicore and GPU algorithms for Nussinov RNA folding , 2014, BMC Bioinformatics.

[11]  Li Liu,et al.  Efficient Nonserial Polyadic Dynamic Programming on the Cell Processor , 2011, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum.

[12]  Martin Griebl,et al.  Automatic Parallelization of Loop Programs for Distributed Memory Architectures , 2004 .

[13]  Sartaj Sahni,et al.  Cache and energy efficient algorithms for Nussinov’s RNA Folding , 2017, BMC Bioinformatics.

[14]  Uday Bondhugula,et al.  A practical automatic polyhedral parallelizer and locality optimizer , 2008, PLDI '08.

[15]  Ninghui Sun,et al.  Locality and Parallelism Optimization for Dynamic Programming Algorithm in Bioinformatics , 2006, ACM/IEEE SC 2006 Conference (SC'06).