Speculatively exploiting cross-invocation parallelism
暂无分享,去创建一个
Soumyadeep Ghosh | David I. August | Jialu Huang | Thomas B. Jablin | Jae W. Lee | Sotiris Apostolakis | Prakash Prabhu | D. I. August | Prakash Prabhu | Soumyadeep Ghosh | T. Jablin | Jialu Huang | Sotiris Apostolakis | Jae W. Lee
[1] Kunle Olukotun,et al. Transactional memory coherence and consistency , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..
[2] L.M. Ni,et al. Trapezoid Self-Scheduling: A Practical Scheduling Scheme for Parallel Compilers , 1993, IEEE Trans. Parallel Distributed Syst..
[3] Evangelos P. Markatos,et al. Using processor affinity in loop scheduling on shared-memory multiprocessors , 1992, Supercomputing '92.
[4] Gregory T. Byrd,et al. On the exploitation of value prediction and producer identification to reduce barrier synchronization time , 2001, Proceedings 15th International Parallel and Distributed Processing Symposium. IPDPS 2001.
[5] Yun Zhang,et al. Decoupled software pipelining creates parallelization opportunities , 2010, CGO '10.
[6] Ken Kennedy,et al. Optimizing Compilers for Modern Architectures: A Dependence-based Approach , 2001 .
[7] Bradley C. Kuszmaul,et al. Cilk: an efficient multithreaded runtime system , 1995, PPOPP '95.
[8] Josep Torrellas,et al. Speculative synchronization: applying thread-level speculation to explicitly parallel applications , 2002, ASPLOS X.
[9] Maged M. Michael,et al. RingSTM: scalable transactions with a single atomic instruction , 2008, SPAA '08.
[10] Paul E. McKenney,et al. Memory Barriers: a Hardware View for Software Hackers , 2010 .
[11] Antonia Zhai,et al. The STAMPede approach to thread-level speculation , 2005, TOCS.
[12] Joel H. Saltz,et al. Run-time parallelization and scheduling of loops , 1989, SPAA '89.
[13] Sriram Krishnamoorthy,et al. Solving Large, Irregular Graph Problems Using Adaptive Work-Stealing , 2008, 2008 37th International Conference on Parallel Processing.
[14] Nancy M. Amato,et al. A scalable method for run-time loop parallelization , 1995, International Journal of Parallel Programming.
[15] Arturo González-Escribano,et al. The OpenMP source code repository , 2005, 13th Euromicro Conference on Parallel, Distributed and Network-Based Processing.
[16] Michael F. P. O'Boyle,et al. Synchronization Minimization in a SPMD Execution Model , 1995, J. Parallel Distributed Comput..
[17] Guilherme Ottoni,et al. Automatic thread extraction with decoupled software pipelining , 2005, 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'05).
[18] Emery D. Berger,et al. Grace: safe multithreaded programming for C/C++ , 2009, OOPSLA '09.
[19] Yun Zhang,et al. Commutative set: a language extension for implicit parallel programming , 2011, PLDI '11.
[20] David A. Wood,et al. LogTM-SE: Decoupling Hardware Transactional Memory from Caches , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.
[21] Vivek Sarkar,et al. Reducing task creation and termination overhead in explicitly parallel programs , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).
[22] Gu-Yeon Wei,et al. HELIX: automatic parallelization of irregular programs for chip multiprocessing , 2012, CGO '12.
[23] Nir Shavit,et al. Software transactional memory , 1995, PODC '95.
[24] Alok Choudhary,et al. Runtime compilation techniques for data partitioning and communication schedule reuse , 1993, Supercomputing '93.
[25] Chau-Wen Tseng,et al. Improving Compiler and Run-Time Support for Irregular Reductions Using Local Writes , 1998, LCPC.
[26] References , 1971 .
[27] Scott A. Mahlke,et al. Parallelizing sequential applications on commodity hardware using a low-cost software transactional memory , 2009, PLDI '09.
[28] CONSTANTINE D. POLYCHRONOPOULOS,et al. Guided Self-Scheduling: A Practical Scheduling Scheme for Parallel Supercomputers , 1987, IEEE Transactions on Computers.
[29] Alejandro Duran,et al. Unrolling Loops Containing Task Parallelism , 2009, LCPC.
[30] Koichi Wada,et al. Barrier Elimination Based on Access Dependency Analysis for OpenMP , 2006, ISPA.
[31] Lawrence Rauchwerger,et al. The LRPD test: speculative run-time parallelization of loops with privatization and reduction parallelization , 1995, PLDI '95.
[32] Rajiv Gupta. The fuzzy barrier: a mechanism for high speed synchronization of processors , 1989, ASPLOS III.
[33] Chau-Wen Tseng,et al. Compiler optimizations for eliminating barrier synchronization , 1995, PPOPP '95.
[34] Andrew Brownsword,et al. Synchronization via scheduling: techniques for efficiently managing shared state , 2011, PLDI '11.
[35] Maurice Herlihy,et al. Transactional Memory: Architectural Support For Lock-free Data Structures , 1993, Proceedings of the 20th Annual International Symposium on Computer Architecture.
[36] David H. Bailey,et al. The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..
[37] Rajiv Gupta,et al. ECMon: exposing cache events for monitoring , 2009, ISCA '09.
[38] Rajiv Gupta,et al. Copy or Discard execution model for speculative parallelization on multicores , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.
[39] Ron Cytron,et al. Doacross: Beyond Vectorization for Multiprocessors , 1986, ICPP.
[40] Alan Mycroft,et al. Software thread-level speculation: an optimistic library implementation , 2008, IWMSE '08.
[41] William R. Dieter,et al. User-Level Checkpointing for LinuxThreads Programs , 2001, USENIX Annual Technical Conference, FREENIX Track.
[42] Soumyadeep Ghosh,et al. Enabling Efficient Alias Speculation , 2015, LCTES.
[43] Suresh Jagannathan,et al. Speculative N-Way barriers , 2009, DAMP '09.
[44] Chen Ding,et al. Software behavior oriented parallelization , 2007, PLDI '07.
[45] Josep Torrellas,et al. Bulk Disambiguation of Speculative Threads in Multiprocessors , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).
[46] Vipin Chaudhary,et al. Minimum dependence distance tiling of nested loops with non-uniform dependences , 1994, Proceedings of 1994 6th IEEE Symposium on Parallel and Distributed Processing.
[47] David R. Butenhof. Programming with POSIX threads , 1993 .
[48] Kai Li,et al. The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).
[49] Rajiv Gupta,et al. Speculative Optimizations for Parallel Programs on Multicores , 2009, LCPC.
[50] Michael Voss,et al. Optimization via Reflection on Work Stealing in TBB , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.