Kismet: parallel speedup estimates for serial programs
暂无分享,去创建一个
[1] Michael Bedford Taylor,et al. Design decision in the implementation of a raw architecture workstation , 1999 .
[2] Thomas E. Anderson,et al. Quartz: a tool for tuning parallel program performance , 1990, SIGMETRICS '90.
[3] Kunle Olukotun,et al. Exposing speculative thread parallelism in SPEC2000 , 2005, PPoPP.
[4] Victor Lee,et al. The RAW benchmark suite: computation structures for general purpose computing , 1997, Proceedings. The 5th Annual IEEE Symposium on Field-Programmable Custom Computing Machines Cat. No.97TB100186).
[5] Barton P. Miller,et al. The Paradyn Parallel Performance Measurement Tool , 1995, Computer.
[6] Henry Hoffmann,et al. Evaluation of the Raw microprocessor: an exposed-wire-delay architecture for ILP and streams , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..
[7] Nathan R. Tallent,et al. Effective performance measurement and analysis of multithreaded applications , 2009, PPoPP '09.
[8] Vivek Sarkar,et al. Space-time scheduling of instruction-level parallelism on a raw machine , 1998, ASPLOS VIII.
[9] Nicholas Nethercote,et al. How to shadow every byte of memory used by a program , 2007, VEE '07.
[10] Lawrence Rauchwerger,et al. Measuring limits of parallelism and characterizing its vulnerability to resource constraints , 1993, Proceedings of the 26th Annual International Symposium on Microarchitecture.
[11] Henry Hoffmann,et al. The Raw Microprocessor: A Computational Fabric for Software Circuits and General-Purpose Programs , 2002, IEEE Micro.
[12] Ben Lee,et al. Performance Evaluation of Dynamic Speculative Multithreading with the Cascadia Architecture , 2010, IEEE Transactions on Parallel and Distributed Systems.
[13] Feng Liu,et al. Scalable Speculative Parallelization on Commodity Clusters , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.
[14] Parag A. Pathak,et al. Massachusetts Institute of Technology , 1964, Nature.
[15] Vikram S. Adve,et al. LLVM: a compilation framework for lifelong program analysis & transformation , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..
[16] Amer Diwan,et al. SUIF Explorer: an interactive and interprocedural parallelizer , 1999, PPoPP '99.
[17] D.A. Reed,et al. An Integrated Compilation and Performance Analysis Environment for Data Parallel Programs , 1995, Proceedings of the IEEE/ACM SC95 Conference.
[18] Vivek Sarkar,et al. The Raw Compiler Project , 1999 .
[19] Saturnino Garcia,et al. Parkour: Parallel Speedup Estimates for Serial Programs , 2011, HotPar.
[20] Martin Schulz,et al. A regression-based approach to scalability prediction , 2008, ICS '08.
[21] David Wentzlaff,et al. Processor: A 64-Core SoC with Mesh Interconnect , 2010 .
[22] Daniel A. Reed,et al. SvPablo: A multi-language architecture-independent performance analysis system , 1999, Proceedings of the 1999 International Conference on Parallel Processing.
[23] James E. Smith,et al. A first-order superscalar processor model , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..
[24] Monica S. Lam,et al. Limits of control flow on parallelism , 1992, ISCA '92.
[25] Milo M. K. Martin,et al. Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset , 2005, CARN.
[26] Manoj Kumar,et al. Measuring Parallelism in Computation-Intensive Scientific/Engineering Applications , 1988, IEEE Trans. Computers.
[27] Gabriel H. Loh. A time-stamping algorithm for efficient performance estimation of superscalar processors , 2001, SIGMETRICS '01.
[28] David H. Bailey,et al. The NAS parallel benchmarks summary and preliminary results , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).
[29] Guang R. Gao,et al. On the limits of program parallelism and its smoothability , 1992, MICRO.
[30] Easwaran Raman,et al. Parallel-stage decoupled software pipelining , 2008, CGO '08.
[31] Scott A. Mahlke,et al. Uncovering hidden loop level parallelism in sequential applications , 2008, 2008 IEEE 14th International Symposium on High Performance Computer Architecture.
[32] Lieven Eeckhout,et al. Performance prediction based on inherent program similarity , 2006, 2006 International Conference on Parallel Architectures and Compilation Techniques (PACT).
[33] John L. Hennessy,et al. Efficient performance prediction for modern microprocessors , 2000, SIGMETRICS '00.
[34] Steven Swanson,et al. GreenDroid: A mobile application processor for a future of dark silicon , 2010, 2010 IEEE Hot Chips 22 Symposium (HCS).
[35] Saturnino Garcia,et al. Kremlin: like gprof, but for parallelization , 2011, PPoPP '11.
[36] Qin Zhao,et al. Efficient memory shadowing for 64-bit architectures , 2010, ISMM '10.
[37] Michael Bedford Taylor,et al. Tiled microprocessors , 2007 .
[38] Qin Zhao,et al. Umbra: efficient and scalable memory shadowing , 2010, CGO '10.
[39] James C. Hoe,et al. Single-Chip Heterogeneous Computing: Does the Future Include Custom Logic, FPGAs, and GPGPUs? , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.
[40] Margaret Martonosi,et al. Integrating performance monitoring and communication in parallel computers , 1996, SIGMETRICS '96.
[41] Wenguang Chen,et al. PHANTOM: predicting performance of parallel applications on large-scale parallel machines using a single node , 2010, PPoPP '10.
[42] Keshav Pingali,et al. How much parallelism is there in irregular applications? , 2009, PPoPP '09.
[43] Hyesoon Kim,et al. SD3: A Scalable Approach to Dynamic Data-Dependence Profiling , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.
[44] Mark D. Hill,et al. Amdahl's Law in the Multicore Era , 2008, Computer.
[45] Nicholas Nethercote,et al. Valgrind: a framework for heavyweight dynamic binary instrumentation , 2007, PLDI '07.
[46] Rajiv Gupta,et al. Timestamped whole program path representation and its applications , 2001, PLDI '01.
[47] James R. Larus,et al. Loop-Level Parallelism in Numeric and Symbolic Programs , 1993, IEEE Trans. Parallel Distributed Syst..
[48] Anant Agarwal,et al. Scalar operand networks , 2005, IEEE Transactions on Parallel and Distributed Systems.
[49] David H. Bailey,et al. The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..
[50] David W. Wall,et al. Limits of instruction-level parallelism , 1991, ASPLOS IV.
[51] C. Luk,et al. Prospector : A Dynamic Data-Dependence Profiler To Help Parallel Programming , 2010 .
[52] Xiangyu Zhang,et al. Alchemist: A Transparent Dependence Distance Profiling Infrastructure , 2009, 2009 International Symposium on Code Generation and Optimization.
[53] Saturnino Garcia,et al. Kremlin: rethinking and rebooting gprof for the multicore age , 2011, PLDI '11.
[54] Todd M. Austin,et al. Dynamic dependency analysis of ordinary programs , 1992, ISCA '92.
[55] Yuxiong He,et al. The Cilkview scalability analyzer , 2010, SPAA '10.
[56] J. Mark Bull,et al. A microbenchmark suite for OpenMP 2.0 , 2001, CARN.
[57] Li Zhao,et al. Exploring Large-Scale CMP Architectures Using ManySim , 2007, IEEE Micro.
[58] Saturnino Garcia,et al. Bridging the Parallelization Gap : Automating Parallelism Discovery and Planning , 2010 .
[59] Vivek Sarkar,et al. Baring It All to Software: Raw Machines , 1997, Computer.
[60] James R. Larus,et al. Exploiting hardware performance counters with flow and context sensitive profiling , 1997, PLDI '97.