Parallel speedup estimates for serial programs
暂无分享,去创建一个
[1] Saturnino Garcia,et al. Kremlin: like gprof, but for parallelization , 2011, PPoPP '11.
[2] James C. Hoe,et al. Single-Chip Heterogeneous Computing: Does the Future Include Custom Logic, FPGAs, and GPGPUs? , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.
[3] Gabriel H. Loh. A time-stamping algorithm for efficient performance estimation of superscalar processors , 2001, SIGMETRICS '01.
[4] Margaret Martonosi,et al. Integrating performance monitoring and communication in parallel computers , 1996, SIGMETRICS '96.
[5] Thomas E. Anderson,et al. Quartz: a tool for tuning parallel program performance , 1990, SIGMETRICS '90.
[6] Wenguang Chen,et al. PHANTOM: predicting performance of parallel applications on large-scale parallel machines using a single node , 2010, PPoPP '10.
[7] Henry Hoffmann,et al. Evaluation of the Raw microprocessor: an exposed-wire-delay architecture for ILP and streams , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..
[8] Qin Zhao,et al. Practical memory checking with Dr. Memory , 2011, International Symposium on Code Generation and Optimization (CGO 2011).
[9] D.A. Reed,et al. An Integrated Compilation and Performance Analysis Environment for Data Parallel Programs , 1995, Proceedings of the IEEE/ACM SC95 Conference.
[10] Mark N. Wegman,et al. Efficiently computing static single assignment form and the control dependence graph , 1991, TOPL.
[11] Manoj Kumar,et al. Measuring Parallelism in Computation-Intensive Scientific/Engineering Applications , 1988, IEEE Trans. Computers.
[12] David W. Wall,et al. Limits of instruction-level parallelism , 1991, ASPLOS IV.
[13] C. Luk,et al. Prospector : A Dynamic Data-Dependence Profiler To Help Parallel Programming , 2010 .
[14] Anant Agarwal,et al. Scalar operand networks , 2005, IEEE Transactions on Parallel and Distributed Systems.
[15] Vikram S. Adve,et al. LLVM: a compilation framework for lifelong program analysis & transformation , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..
[16] Kunle Olukotun,et al. Exposing speculative thread parallelism in SPEC2000 , 2005, PPoPP.
[17] Pranith Kumar,et al. Predicting Potential Speedup of Serial Code via Lightweight Profiling and Emulations with Memory Performance Model , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.
[18] Vivek Sarkar,et al. Space-time scheduling of instruction-level parallelism on a raw machine , 1998, ASPLOS VIII.
[19] Wei Xu,et al. Taint-Enhanced Policy Enforcement: A Practical Approach to Defeat a Wide Range of Attacks , 2006, USENIX Security Symposium.
[20] Nicholas Nethercote,et al. How to shadow every byte of memory used by a program , 2007, VEE '07.
[21] Keshav Pingali,et al. How much parallelism is there in irregular applications? , 2009, PPoPP '09.
[22] Hyesoon Kim,et al. SD3: A Scalable Approach to Dynamic Data-Dependence Profiling , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.
[23] Mark D. Hill,et al. Amdahl's Law in the Multicore Era , 2008, Computer.
[24] David H. Bailey,et al. The NAS parallel benchmarks summary and preliminary results , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).
[25] Guang R. Gao,et al. On the limits of program parallelism and its smoothability , 1992, MICRO.
[26] Lawrence Rauchwerger,et al. Measuring limits of parallelism and characterizing its vulnerability to resource constraints , 1993, Proceedings of the 26th Annual International Symposium on Microarchitecture.
[27] Henry Hoffmann,et al. The Raw Microprocessor: A Computational Fabric for Software Circuits and General-Purpose Programs , 2002, IEEE Micro.
[28] Ben Lee,et al. Performance Evaluation of Dynamic Speculative Multithreading with the Cascadia Architecture , 2010, IEEE Transactions on Parallel and Distributed Systems.
[29] Cheng Wang,et al. LIFT: A Low-Overhead Practical Information Flow Tracking System for Detecting Security Attacks , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).
[30] Michael Bedford Taylor,et al. Tiled microprocessors , 2007 .
[31] Qin Zhao,et al. Umbra: efficient and scalable memory shadowing , 2010, CGO '10.
[32] Ware Myers. Supercomputing 91 , 1992 .
[33] Feng Liu,et al. Scalable Speculative Parallelization on Commodity Clusters , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.
[34] Yuxiong He,et al. The Cilkview scalability analyzer , 2010, SPAA '10.
[35] Vivek Sarkar,et al. The Raw Compiler Project , 1999 .
[36] James E. Smith,et al. A first-order superscalar processor model , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..
[37] David Wentzlaff,et al. Processor: A 64-Core SoC with Mesh Interconnect , 2010 .
[38] James R. Larus,et al. Exploiting hardware performance counters with flow and context sensitive profiling , 1997, PLDI '97.
[39] Susan L. Graham,et al. Gprof: A call graph execution profiler , 1982, SIGPLAN '82.
[40] Daniel A. Reed,et al. SvPablo: A multi-language architecture-independent performance analysis system , 1999, Proceedings of the 1999 International Conference on Parallel Processing.
[41] Xiangyu Zhang,et al. Alchemist: A Transparent Dependence Distance Profiling Infrastructure , 2009, 2009 International Symposium on Code Generation and Optimization.
[42] Easwaran Raman,et al. Parallel-stage decoupled software pipelining , 2008, CGO '08.
[43] Victor Lee,et al. The RAW benchmark suite: computation structures for general purpose computing , 1997, Proceedings. The 5th Annual IEEE Symposium on Field-Programmable Custom Computing Machines Cat. No.97TB100186).
[44] Barton P. Miller,et al. The Paradyn Parallel Performance Measurement Tool , 1995, Computer.
[45] Saturnino Garcia,et al. Kremlin: rethinking and rebooting gprof for the multicore age , 2011, PLDI '11.
[46] Nicholas Nethercote,et al. Valgrind: a framework for heavyweight dynamic binary instrumentation , 2007, PLDI '07.
[47] Saturnino Garcia,et al. Parkour: Parallel Speedup Estimates for Serial Programs , 2011, HotPar.
[48] Martin Schulz,et al. A regression-based approach to scalability prediction , 2008, ICS '08.
[49] Todd M. Austin,et al. Dynamic dependency analysis of ordinary programs , 1992, ISCA '92.
[50] Scott A. Mahlke,et al. Uncovering hidden loop level parallelism in sequential applications , 2008, 2008 IEEE 14th International Symposium on High Performance Computer Architecture.
[51] Lieven Eeckhout,et al. Performance prediction based on inherent program similarity , 2006, 2006 International Conference on Parallel Architectures and Compilation Techniques (PACT).
[52] Steven Swanson,et al. GreenDroid: A mobile application processor for a future of dark silicon , 2010, 2010 IEEE Hot Chips 22 Symposium (HCS).
[53] Amer Diwan,et al. SUIF Explorer: an interactive and interprocedural parallelizer , 1999, PPoPP '99.
[54] Qin Zhao,et al. Efficient memory shadowing for 64-bit architectures , 2010, ISMM '10.
[55] Vivek Sarkar,et al. Baring It All to Software: Raw Machines , 1997, Computer.
[56] Rajiv Gupta,et al. Timestamped whole program path representation and its applications , 2001, PLDI '01.
[57] James R. Larus,et al. Loop-Level Parallelism in Numeric and Symbolic Programs , 1993, IEEE Trans. Parallel Distributed Syst..
[58] John L. Hennessy,et al. Efficient performance prediction for modern microprocessors , 2000, SIGMETRICS '00.
[59] Michael F. P. O'Boyle,et al. Towards a holistic approach to auto-parallelization: integrating profile-driven parallelism detection and machine-learning based mapping , 2009, PLDI '09.
[60] Bei Yu,et al. TaintTrace: Efficient Flow Tracing with Dynamic Binary Rewriting , 2006, 11th IEEE Symposium on Computers and Communications (ISCC'06).
[61] Nicholas Nethercote,et al. Using Valgrind to Detect Undefined Value Errors with Bit-Precision , 2005, USENIX Annual Technical Conference, General Track.
[62] J. Mark Bull,et al. A microbenchmark suite for OpenMP 2.0 , 2001, CARN.
[63] Li Zhao,et al. Exploring Large-Scale CMP Architectures Using ManySim , 2007, IEEE Micro.
[64] David H. Bailey,et al. The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..
[65] Monica S. Lam,et al. Limits of control flow on parallelism , 1992, ISCA '92.
[66] Milo M. K. Martin,et al. Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset , 2005, CARN.
[67] Yoichi Muraoka,et al. On the Number of Operations Simultaneously Executable in Fortran-Like Programs and Their Resulting Speedup , 1972, IEEE Transactions on Computers.
[68] David Evans,et al. Towards Differential Program Analysis , 2022 .