The Cilkprof Scalability Profiler
暂无分享,去创建一个
Bradley C. Kuszmaul | Charles E. Leiserson | Tao B. Schardl | I-Ting Angelina Lee | William M. Leiserson
[1] Gary L. Miller,et al. Geometric Mesh Partitioning: Implementation and Experiments , 1998, SIAM J. Sci. Comput..
[2] Nathan R. Tallent,et al. Effective performance measurement and analysis of multithreaded applications , 2009, PPoPP '09.
[3] K. R. Rao,et al. High efficiency video coding , 2016, 2016 Signal Processing: Algorithms, Architectures, Arrangements, and Applications (SPA).
[4] Saturnino Garcia,et al. Kismet: parallel speedup estimates for serial programs , 2011, OOPSLA '11.
[5] Allen D. Malony,et al. The Tau Parallel Performance System , 2006, Int. J. High Perform. Comput. Appl..
[6] Evelyn Duesterwald,et al. Design and implementation of a dynamic optimization framework for windows , 2000 .
[7] Vikram S. Adve,et al. LLVM: a compilation framework for lifelong program analysis & transformation , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..
[8] Kai Li,et al. Characteristics of workloads using the pipeline programming model , 2010, ISCA'10.
[9] Matteo Frigo,et al. The implementation of the Cilk-5 multithreaded language , 1998, PLDI.
[10] Konstantin Serebryany,et al. ThreadSanitizer: data race detection in practice , 2009, WBIA '09.
[11] Charles E. Leiserson,et al. The Cilk++ concurrency platform , 2009, 2009 46th ACM/IEEE Design Automation Conference.
[12] Saturnino Garcia,et al. Kremlin: like gprof, but for parallelization , 2011, PPoPP '11.
[13] C. A. R. Hoare,et al. Algorithm 65: find , 1961, Commun. ACM.
[14] R. K. Shyamasundar,et al. Introduction to algorithms , 1996 .
[15] Susan L. Graham,et al. Gprof: A call graph execution profiler , 1982, SIGPLAN '82.
[16] Thomas E. Anderson,et al. Quartz: a tool for tuning parallel program performance , 1990, SIGMETRICS '90.
[17] Robert Dietrich,et al. OMPT: An OpenMP Tools Application Programming Interface for Performance Analysis , 2013, IWOMP.
[18] Derek Bruening,et al. AddressSanitizer: A Fast Address Sanity Checker , 2012, USENIX Annual Technical Conference.
[19] Nathan R. Tallent,et al. HPCTOOLKIT: tools for performance analysis of optimized parallel programs , 2010, Concurr. Comput. Pract. Exp..
[20] Harish Patil,et al. Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.
[21] Yuxiong He,et al. The Cilkview scalability analyzer , 2010, SPAA '10.
[22] Matthias S. Müller,et al. The Vampir Performance Analysis Tool-Set , 2008, Parallel Tools Workshop.
[23] Kai Li,et al. The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).
[24] Charles E. Leiserson,et al. A work-efficient parallel breadth-first search algorithm (or how to cope with the nondeterminism of reducers) , 2010, SPAA '10.
[25] Wolfgang E. Nagel,et al. Performance Optimization for Large Scale Computing: The Scalable VAMPIR Approach , 2001, International Conference on Computational Science.
[26] Ronald L. Rivest,et al. Introduction to Algorithms, third edition , 2009 .
[27] Nicholas Nethercote,et al. Valgrind: a framework for heavyweight dynamic binary instrumentation , 2007, PLDI '07.
[28] Saturnino Garcia,et al. Kremlin: rethinking and rebooting gprof for the multicore age , 2011, PLDI '11.
[29] Tony Hoare,et al. Algorithm 63‚ Partition; Algorithm 64‚ Quicksort; Algorithm 65‚ Find , 1961 .
[30] Robert D. Blumofe,et al. Scheduling multithreaded computations by work stealing , 1994, Proceedings 35th Annual Symposium on Foundations of Computer Science.