The Kremlin Oracle for Sequential Code Parallelization

The Kremlin open-source tool helps programmers by automatically identifying regions in sequential programs that merit parallelization. Kremlin combines a novel dynamic program analysis, hierarchical critical-path analysis, with multicore processor models to evaluate thousands of potential parallelization strategies and estimate their performance outcomes.

[1]  Vivek Sarkar,et al.  Space-time scheduling of instruction-level parallelism on a raw machine , 1998, ASPLOS VIII.

[2]  Saturnino Garcia,et al.  Kremlin: rethinking and rebooting gprof for the multicore age , 2011, PLDI '11.

[3]  Guilherme Ottoni,et al.  Automatic thread extraction with decoupled software pipelining , 2005, 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'05).

[4]  Matteo Frigo,et al.  The implementation of the Cilk-5 multithreaded language , 1998, PLDI.

[5]  Todd M. Austin,et al.  Dynamic dependency analysis of ordinary programs , 1992, ISCA '92.

[6]  Monica S. Lam,et al.  Maximizing Multiprocessor Performance with the SUIF Compiler , 1996, Digit. Tech. J..

[7]  James R. Larus,et al.  Loop-Level Parallelism in Numeric and Symbolic Programs , 1993, IEEE Trans. Parallel Distributed Syst..

[8]  Michael F. P. O'Boyle,et al.  Towards a holistic approach to auto-parallelization: integrating profile-driven parallelism detection and machine-learning based mapping , 2009, PLDI '09.

[9]  C. Luk,et al.  Prospector : A Dynamic Data-Dependence Profiler To Help Parallel Programming , 2010 .

[10]  Saturnino Garcia,et al.  Kismet: parallel speedup estimates for serial programs , 2011, OOPSLA '11.

[11]  Yunheung Paek,et al.  Parallel Programming with Polaris , 1996, Computer.

[12]  Yuxiong He,et al.  The Cilkview scalability analyzer , 2010, SPAA '10.

[13]  David H. Bailey,et al.  The NAS parallel benchmarks summary and preliminary results , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).

[14]  Gu-Yeon Wei,et al.  HELIX: automatic parallelization of irregular programs for chip multiprocessing , 2012, CGO '12.

[15]  Manoj Kumar,et al.  Measuring Parallelism in Computation-Intensive Scientific/Engineering Applications , 1988, IEEE Trans. Computers.

[16]  Kevin Skadron,et al.  Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).

[17]  Pranith Kumar,et al.  Predicting Potential Speedup of Serial Code via Lightweight Profiling and Emulations with Memory Performance Model , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.