A Transformation Framework for Optimizing Task-Parallel Programs
暂无分享,去创建一个
Vivek Sarkar | Jun Shirako | V. Krishna Nandivada | Jisheng Zhao | Vivek Sarkar | Jisheng Zhao | J. Shirako | V. K. Nandivada
[1] Yi Guo,et al. Work-first and help-first scheduling policies for async-finish task parallelism , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.
[2] Mary Lou Soffa,et al. Concurrency analysis in the presence of procedures using a data-flow framework , 1991, TAV4.
[3] Vivek Sarkar,et al. Efficient Dependence Analysis for Java Arrays , 2001, Euro-Par.
[4] CONSTANTINE D. POLYCHRONOPOULOS,et al. Guided Self-Scheduling: A Practical Scheduling Scheme for Parallel Supercomputers , 1987, IEEE Transactions on Computers.
[5] Sayantan Sur,et al. Efficient, portable implementation of asynchronous multi-place programs , 2009, PPoPP '09.
[6] Rajiv Gupta. The fuzzy barrier: a mechanism for high speed synchronization of processors , 1989, ASPLOS III.
[7] Bradley C. Kuszmaul,et al. Cilk: an efficient multithreaded runtime system , 1995, PPOPP '95.
[8] Michael Wolfe,et al. High performance compilers for parallel computing , 1995 .
[9] Lieven Eeckhout,et al. Statistically rigorous java performance evaluation , 2007, OOPSLA.
[10] Chau-Wen Tseng,et al. Compiler optimizations for eliminating barrier synchronization , 1995, PPOPP '95.
[11] Leslie Lamport,et al. Time, clocks, and the ordering of events in a distributed system , 1978, CACM.
[12] Martin C. Rinard,et al. Purity and Side Effect Analysis for Java Programs , 2005, VMCAI.
[13] Vivek Sarkar,et al. May-happen-in-parallel analysis of X10 programs , 2007, PPoPP.
[14] Laurie Hendren,et al. Soot: a Java bytecode optimization framework , 2010, CASCON.
[15] David Holmes,et al. Java Concurrency in Practice , 2006 .
[16] Bradley C. Kuszmaul,et al. Cilk: an efficient multithreaded runtime system , 1995, PPOPP '95.
[17] Alan Weiss,et al. Allocating Independent Subtasks on Parallel Processors , 1985, IEEE Transactions on Software Engineering.
[18] James R. Larus,et al. Transactional Memory , 2006, Transactional Memory.
[19] David H. Bailey,et al. The NAS parallel benchmarks summary and preliminary results , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).
[20] Steven S. Muchnick,et al. Advanced Compiler Design and Implementation , 1997 .
[21] Charles E. Leiserson,et al. Efficient Detection of Determinacy Races in Cilk Programs , 1997, SPAA '97.
[22] Jason Duell,et al. Productivity and performance using partitioned global address space languages , 2007, PASCO '07.
[23] Monica S. Lam,et al. Communication optimization and code generation for distributed memory machines , 1993, PLDI '93.
[24] Alejandro Duran,et al. Unrolling Loops Containing Task Parallelism , 2009, LCPC.
[25] Leslie Lamport,et al. How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs , 2016, IEEE Transactions on Computers.
[26] Koichi Wada,et al. Barrier Elimination Based on Access Dependency Analysis for OpenMP , 2006, ISPA.
[27] Edith Schonberg,et al. A compiler-assisted approach to SPMD execution , 1990, Proceedings SUPERCOMPUTING '90.
[28] Alexander V. Veidenbaum,et al. Synchronization optimizations for efficient execution on multi-cores , 2009, ICS '09.
[29] Michael R. Clarkson,et al. Polyglot: An Extensible Compiler Framework for Java , 2003, CC.
[30] Vivek Sarkar,et al. X10: an object-oriented approach to non-uniform cluster computing , 2005, OOPSLA '05.
[31] David Grove,et al. Optimization of Object-Oriented Programs Using Static Class Hierarchy Analysis , 1995, ECOOP.
[32] James F. Power,et al. Platform independent dynamic Java virtual machine analysis: the Java Grande Forum benchmark suite , 2001, JGI '01.
[33] Håkan Grahn,et al. Transactional memory , 2010, J. Parallel Distributed Comput..
[34] Alejandro Duran,et al. Barcelona OpenMP Tasks Suite: A Set of Benchmarks Targeting the Exploitation of Task Parallelism in OpenMP , 2009, 2009 International Conference on Parallel Processing.
[35] Vivek Sarkar,et al. Location Consistency-A New Memory Model and Cache Consistency Protocol , 2000, IEEE Trans. Computers.
[36] Ken Kennedy,et al. Optimizing Compilers for Modern Architectures: A Dependence-based Approach , 2001 .
[37] Steven J. Deitz,et al. The High-Level Parallel Language ZPL Improves Productivity and Performance , 2004 .
[38] Michael Philippsen,et al. Synchronization barrier elimination in synchronous FORALLs , 1993 .
[39] Michael Metcalf,et al. Fortran 90 Explained , 1990 .
[40] Michael Wolfe,et al. Data dependence and its application to parallel processing , 2005, International Journal of Parallel Programming.
[41] Vivek Sarkar,et al. Intermediate language extensions for parallelism , 2011, SPLASH Workshops.
[42] Vivek Sarkar,et al. Reducing task creation and termination overhead in explicitly parallel programs , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).
[43] Mahmut T. Kandemir,et al. Temperature-sensitive loop parallelization for chip multiprocessors , 2005, 2005 International Conference on Computer Design.
[44] Ondrej Lhoták,et al. Scaling Java Points-to Analysis Using SPARK , 2003, CC.
[45] Vivek Sarkar,et al. Chunking parallel loops in the presence of synchronization , 2009, ICS.
[46] Martin C. Rinard,et al. Synchronization transformations for parallel computing , 1999, POPL '97.
[47] Vivek Sarkar. Synchronization using counting semaphores , 1988, ICS '88.
[48] Vivek Sarkar,et al. Phasers: a unified deadlock-free construct for collective and point-to-point synchronization , 2008, ICS '08.