Kremlin: rethinking and rebooting gprof for the multicore age
暂无分享,去创建一个
Saturnino Garcia | Michael Bedford Taylor | Donghwan Jeon | Christopher M. Louie | M. Taylor | Saturnino Garcia | Donghwan Jeon
[1] Chen Yang,et al. A cost-driven compilation framework for speculative parallelization of sequential programs , 2004, PLDI '04.
[2] Scott A. Mahlke,et al. Uncovering hidden loop level parallelism in sequential applications , 2008, 2008 IEEE 14th International Symposium on High Performance Computer Architecture.
[3] L. Rauchwerger,et al. The LRPD Test: Speculative Run-Time Parallelization of Loops with Privatization and Reduction Parallelization , 1999, IEEE Trans. Parallel Distributed Syst..
[4] Nathan R. Tallent,et al. Effective performance measurement and analysis of multithreaded applications , 2009, PPoPP '09.
[5] Peng Wu,et al. Compiler-Driven Dependence Profiling to Guide Program Parallelization , 2008, LCPC.
[6] Manoj Kumar,et al. Measuring Parallelism in Computation-Intensive Scientific/Engineering Applications , 1988, IEEE Trans. Computers.
[7] Xiangyu Zhang,et al. Alchemist: A Transparent Dependence Distance Profiling Infrastructure , 2009, 2009 International Symposium on Code Generation and Optimization.
[8] Rajesh Bordawekar,et al. Modeling optimistic concurrency using quantitative dependence analysis , 2008, PPOPP.
[9] Wei Liu,et al. POSH: a TLS compiler that exploits program structure , 2006, PPoPP '06.
[10] Todd M. Austin,et al. Dynamic dependency analysis of ordinary programs , 1992, ISCA '92.
[11] Wilson C. Hsieh,et al. A framework for determining useful parallelism , 1988, ICS '88.
[12] Yunheung Paek,et al. Parallel Programming with Polaris , 1996, Computer.
[13] David H. Bailey,et al. The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..
[14] Rajiv Gupta,et al. Timestamped whole program path representation and its applications , 2001, PLDI '01.
[15] James R. Larus,et al. Loop-Level Parallelism in Numeric and Symbolic Programs , 1993, IEEE Trans. Parallel Distributed Syst..
[16] Frank Tip,et al. Refactoring for reentrancy , 2009, ESEC/FSE '09.
[17] Qin Zhao,et al. Umbra: efficient and scalable memory shadowing , 2010, CGO '10.
[18] Charles E. Leiserson,et al. The Cilk++ concurrency platform , 2009, 2009 46th ACM/IEEE Design Automation Conference.
[19] Jesús Labarta,et al. Interfacing Computer Aided Parallelization and Performance Analysis , 2003, International Conference on Computational Science.
[20] Andreas Zeller,et al. Profiling Java programs for parallelism , 2009, 2009 ICSE Workshop on Multicore Software Engineering.
[21] Vivek Sarkar,et al. Space-time scheduling of instruction-level parallelism on a raw machine , 1998, ASPLOS VIII.
[22] Vikram S. Adve,et al. LLVM: a compilation framework for lifelong program analysis & transformation , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..
[23] Susan L. Graham,et al. Gprof: A call graph execution profiler , 1982, SIGPLAN '82.
[24] Steven Hall,et al. Manipulating lossless video in the compressed domain , 2009, MM '09.
[25] Yuxiong He,et al. The Cilkview scalability analyzer , 2010, SPAA '10.
[26] Kunle Olukotun,et al. The Jrpm system for dynamically parallelizing Java programs , 2003, ISCA '03.
[27] Amer Diwan,et al. SUIF Explorer: an interactive and interprocedural parallelizer , 1999, PPoPP '99.
[28] J. Mark Bull,et al. A microbenchmark suite for OpenMP 2.0 , 2001, CARN.
[29] Ken Kennedy,et al. Interactive Parallel Programming using the ParaScope Editor , 1991, IEEE Trans. Parallel Distributed Syst..
[30] Chen Ding,et al. Fast Track: A Software System for Speculative Program Optimization , 2009, 2009 International Symposium on Code Generation and Optimization.
[31] Rajiv Gupta,et al. Copy or Discard execution model for speculative parallelization on multicores , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.
[32] Xiangyu Zhang,et al. Efficient online detection of dynamic control dependence , 2007, ISSTA '07.
[33] Thomas E. Anderson,et al. Quartz: a tool for tuning parallel program performance , 1990, SIGMETRICS '90.
[34] Michael D. Ernst,et al. Refactoring sequential Java code for concurrency via concurrent libraries , 2009, 2009 IEEE 31st International Conference on Software Engineering.
[35] Monica S. Lam,et al. Maximizing Multiprocessor Performance with the SUIF Compiler , 1996, Digit. Tech. J..
[36] P. Sadayappan,et al. Understanding parallelism-inhibiting dependences in sequential Java programs , 2010, 2010 IEEE International Conference on Software Maintenance.
[37] Vivek Sarkar,et al. X10: concurrent programming for modern architectures , 2007, PPOPP.
[38] Yoichi Muraoka,et al. On the Number of Operations Simultaneously Executable in Fortran-Like Programs and Their Resulting Speedup , 1972, IEEE Transactions on Computers.
[39] Keshav Pingali,et al. How much parallelism is there in irregular applications? , 2009, PPoPP '09.
[40] Hyesoon Kim,et al. SD3: A Scalable Approach to Dynamic Data-Dependence Profiling , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.
[41] Nicholas Nethercote,et al. Valgrind: a framework for heavyweight dynamic binary instrumentation , 2007, PLDI '07.
[42] Michael F. P. O'Boyle,et al. Towards a holistic approach to auto-parallelization: integrating profile-driven parallelism detection and machine-learning based mapping , 2009, PLDI '09.
[43] Serge J. Belongie,et al. SD-VBS: The San Diego Vision Benchmark Suite , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).
[44] Saturnino Garcia,et al. Kremlin: like gprof, but for parallelization , 2011, PPoPP '11.